Recursively compute all the SWHID(s) for a given source code object
Details
- Reviewers
zack - Group Reviewers
Reviewers - Maniphest Tasks
- T3160: swh identify: add a -R/--recursive flag
Diff Detail
- Repository
- rDMOD Data model
- Branch
- recursive-support
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 21891 Build 34040: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 34039: arc lint + arc unit
Event Timeline
Build is green
Patch application report for D5825 (id=20843)
Rebasing onto 4808fc2ae5...
Current branch diff-target is up to date.
Changes applied before test
commit d9b5ba15208e4a3b8073dde8f268683a5a4e4839 Author: Daniele Serafini <me@danieleserafini.eu> Date: Tue Jun 8 13:51:47 2021 +0100 cli: recursive option
See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/339/ for more details.
Cool, thanks!
One question and two requests:
- What happens if one runs it on a directory and --recursive isn't given?
- If relevant, could you implement --verify too?
- could you add tests?
swh/model/cli.py | ||
---|---|---|
27 | can you keep this import inside functions? | |
283 | what about path = os.fsencode(objects[0]), for consistency with the other half the function? | |
287 | ditto |
- What happens if one runs it on a directory and --recursive isn't given?
It will show only the SWHID of the given directory, basically the same process as before is applied
- If relevant, could you implement --verify too?
Sure, if it is useful i could open another diff for it
- could you add tests?
I would add a cli test but it seems that the Content objects generated with the sample-folder uses the attribute data instead of path generating a key error during the test (since the same key is used to retrieve the path from both the Directory and the Content objects; from_disk.DiskBackedContent and model.Content have different attributes name). Would it be safe to change the attribute name?
But what was the process before? Did it ignore directory entries?
- could you add tests?
I would add a cli test but it seems that the Content objects generated with the sample-folder uses the attribute data instead of path generating a key error during the test (since the same key is used to retrieve the path from both the Directory and the Content objects; from_disk.DiskBackedContent and model.Content have different attributes name). Would it be safe to change the attribute name?
Changing the name won't work, they don't have the same values.
If mocks don't return the right type of data, you should change the mocks instead.
This is not going to be easy, because one would need to pass a set of SWHIDs, mapped to individual paths, which is much harder to do on a CLI than passing a single SWHID (as in the current implementation of --verify).
The only easy way to do this would be passing a filename pointing to, e.g., a JSON formatted manifest, to be compared with computed SWHIDs, but I don't see much the point of doing that.
I'm fine with --verify being incompatible with --recursive for the time being (maybe with an appropriate error message).
It will show only the SWHID of the given directory, basically the same process as before is applied
But what was the process before? Did it ignore directory entries?
It checks only the given directories generating a from_disk.Directory object for each directory. Should it uses the same logic used for the recursive option?
I would add a cli test but it seems that the Content objects generated with the sample-folder uses the attribute data instead of path generating a key error during the test (since the same key is used to retrieve the path from both the Directory and the Content objects; from_disk.DiskBackedContent and model.Content have different attributes name). Would it be safe to change the attribute name?
Changing the name won't work, they don't have the same values.
If mocks don't return the right type of data, you should change the mocks instead.
Ok, i'll change the mocks.
To elaborate on this for @vlorentz, swh identify did work on directories before, and did use swh.model.from_disk to compute the SWHID of directories, but due to how the code was organized the built model was not kept around until the printing phase, so it was impossible to decide at that time whether to output only the root SWHID or all inner SWHIDs recursively.
I see, thanks.
And --verify was just an example, it looks like --recursive causes all options but --exclude to be ignored. Considering this and the completely disjoint code-paths, wouldn't it make sense to have a separate command (eg. recursive-identify instead of identify --recursive)?
I don't think so, given that a -r/--recursive is a very common pattern for CLI commands that work on files and that other options are much more seldomly used.
(However we should make sure if they are used in an incompatible way, the tool fails with a clear error message.)
swh/model/cli.py | ||
---|---|---|
27 | I guess this was here because "Directory" is used as the return type of the generate_dir_obj function. | |
78–79 | please rename this function to something like "model_of_dir" | |
296 | Exit mid-way through the function isn't great. Can you instead put what follows under the "else" branch? |
swh/model/cli.py | ||
---|---|---|
27 | in that case I can't use it as a returning type in generate_dir_obj. |
swh/model/cli.py | ||
---|---|---|
296 | ack, I will do it. |
- requested changes
- get the object path based on the data contained in the node object
Build is green
Patch application report for D5825 (id=20909)
Rebasing onto 8540c6700c...
First, rewinding head to replay your work on top of it... Applying: cli: add recursive option
Changes applied before test
commit 742f4cd5e32bddcac85a4f0196902ffb6112c9e3 Author: Daniele Serafini <me@danieleserafini.eu> Date: Tue Jun 8 13:51:47 2021 +0100 cli: add recursive option
See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/340/ for more details.
It seems it's still missing incompatibility checks with --no-dereference, --no-filename, and --type
It seems it's still missing incompatibility checks with --no-dereference, --no-filename, and --type
Actually, --no-filename could be compatible (and implemented in this diff); should i include it?
- missing incompatibility checks
- ignore --recursive if the input object is not a directory
- print only the swhid if using --no-filename
Build is green
Patch application report for D5825 (id=20954)
Rebasing onto 8540c6700c...
First, rewinding head to replay your work on top of it... Applying: cli: add recursive option
Changes applied before test
commit c12b1ebc09ddbdb0072b11ff4f102efd3ab22ca8 Author: Daniele Serafini <me@danieleserafini.eu> Date: Tue Jun 8 13:51:47 2021 +0100 cli: add recursive option
See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/341/ for more details.
swh/model/cli.py | ||
---|---|---|
281 | I like a lot this solution of printing a warning instead of failing. However, you should use "logging.warn" (and hence import logging) instead of a bare print. | |
285 | better: "verification of recursive object identification is not supported" | |
287–288 | I think here the test should be "not in ("auto", "directory")", no? And here's a better error message: "recursive identification is supported only for directories". | |
290–291 | I don't get this one. Why is it incompatible? Both --dereference and --no-dereference should work just fine with or without --recursive, no? (As long as the symlink resolves to a directory it should be fine IMO.) |
swh/model/cli.py | ||
---|---|---|
290–291 | yeah, maybe we can print a warning here or just ignore the option |
Build is green
Patch application report for D5825 (id=20955)
Rebasing onto 8540c6700c...
First, rewinding head to replay your work on top of it... Applying: cli: add recursive option
Changes applied before test
commit 87d41d1294d38be3e8969aa08aba116724636525 Author: Daniele Serafini <me@danieleserafini.eu> Date: Tue Jun 8 13:51:47 2021 +0100 cli: add recursive option
See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/342/ for more details.
Build is green
Patch application report for D5825 (id=20966)
Rebasing onto ae50e43fe0...
First, rewinding head to replay your work on top of it... Fast-forwarded diff-target to base-revision-343-D5825.
Changes applied before test
See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/343/ for more details.