Details
- Reviewers
ardumont acezar zack - Group Reviewers
Reviewers - Commits
- rDMOD2b869aa7d30d: swh identify: add --exclude
Diff Detail
- Branch
- swh-identify.ignore-named-directory
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 16022 Build 24649: arc lint + arc unit
Event Timeline
Thanks, even though this is a little bit disturbing discrepancy wrt swh-scanner exclusion mechanism, it's probably for the best, and arguably we should have something this simple also there.
I've requested some changes about CLI and it's documentation.
swh/model/cli.py | ||
---|---|---|
169 | sounds like a bit of a mouthful, can we just go for --ignore-dir instead? Also, the help string should clarify:
|
can you please remove the "noise" added by arc in the commit message? And update it (still the previous option name in there).
I agree having the same cli interface than the scanner (aka --exclude with support for glob) would be much better. If the glob pattern matching can wait, using the same option name is a "must" for me.
it's probably for the best, and arguably we should have something this simple also there.
@zack Not sure what the problem with the scanner is. Can you elaborate?
swh/model/cli.py | ||
---|---|---|
172 | Minor nit/typo: it should be "globbing", but all in all I'd rather just not mention globbing if it's not supported. I suggest ignore directories by path name (can be repeated). The point about depth remains though. If it does match at any depth, I propose: ignore directories by local path name, matching at any depth (can be repeated). Otherwise it should be: ignore directories by absolute path name, starting from root path (can be repeated) |
Changed --ignore-dir by --exclude
It supports glob, but had to copy extract_regex_objs from swh.scan.
Also, despite extract_regex_objs saying `checks if the path is a subdirectory
or relative to the root path`, during my tests excluding ../foo does not raise
an error.
Fixed path resolution error.
swh-identify --exclude foo my-repo versus
cd my-repo; swh-identify --exclude foo .
As long as it can be in the future invoked from swh-scanner (which already has a dependency on swh-model anyway), that's fine. However, please add an "XXX/TODO" comment in the code noting where the code comes from and that it is a copy/paste.
Once this is landed, please also file a bug against swh-scanner asking to depend on this function rather than duplicating it.
LGTM otherwise !
Except for the small feedback about the exception, this looks fine.
swh/model/exceptions.py | ||
---|---|---|
134 ↗ | (On Diff #14889) | This exception need a docstring. And if I understand it Correctly, it should subclass RuntimeError. |
swh/model/exceptions.py | ||
---|---|---|
134 ↗ | (On Diff #14889) | The exception has been copy pasted from swh.scanner like extract_regex_objs. I don't think changing it as it is just about moving code from one package to another. |
because it's not related to swh.model repo I'd say (then the question is how can arcanist produce diff that's not related with any repo?)
@acezar Now that your arcanist issue is fixed, can you please try and update this diff to check if that could trigger properly the diff build?
If that does not work, i gather closing and opening a new one just to check if that builds fine would be a way forward here.
Thanks in advance.
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/151/
See console output for more information: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/151/console
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/152/
See console output for more information: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/152/console
Build is green
Patch application report for D4193 (id=15277)
Rebasing onto 9224c8ca6e...
Current branch diff-target is up to date.
Changes applied before test
commit 2e1539da6aa6ca56104504939ba911a107976293 Author: Antoine Cezar <antoine@cezar.fr> Date: Wed Oct 7 19:45:44 2020 +0200 swh identify: add --exclude
See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/153/ for more details.
Build is green
Awesome.
I know you are moving only the module from scanner to model.
Still, this should be covered.
Please, add at least one test around the exclude flag in the test_cli module.
The following test [1] is listing the content of a git repository.
You could another scenario based from [1], but this times, it would ignore the
.git folder, resulting in a snapshot (for listing [2]) with a different id.
That should be enough to cover it.
[1] https://forge.softwareheritage.org/source/swh-model/browse/master/swh/model/tests/test_cli.py$55-69
[2] Such listing would result in only the following to list, which:
drwxr-xr-x zack/zack 0 sample-repo/ -rw-r--r-- zack/zack 9 sample-repo/foo -rw-r--r-- zack/zack 5 sample-repo/bar -rw-r--r-- zack/zack 4 sample-repo/baz
Thanks in advance.
Build is green
Patch application report for D4193 (id=15286)
Rebasing onto 9224c8ca6e...
Current branch diff-target is up to date.
Changes applied before test
commit 6f6d0eefe565f1ea300b65306e906de87eb00fec Author: Antoine Cezar <antoine@cezar.fr> Date: Wed Oct 7 19:45:44 2020 +0200 swh identify: add --exclude
See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/154/ for more details.
This globally LGTM but there is this path encoding issue. The 2 new functions in from_disk.py should take a bytes argument instead of a str one.
swh/model/cli.py | ||
---|---|---|
59 | isn't there some encoding risk here? This path.decode() frighten me a bit. | |
swh/model/tests/test_cli.py | ||
150 ↗ | (On Diff #15286) | At least one test with non-ascii path is needed I'd say |
Build is green
Patch application report for D4193 (id=15343)
Rebasing onto 9224c8ca6e...
Current branch diff-target is up to date.
Changes applied before test
commit 0e4fbb9d59bbdc6d777d7a3789a3f8fbbf836b30 Author: Antoine Cezar <antoine@cezar.fr> Date: Wed Oct 7 19:45:44 2020 +0200 swh identify: add --exclude
See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/155/ for more details.
Build is green
Patch application report for D4193 (id=15365)
Rebasing onto 9224c8ca6e...
Current branch diff-target is up to date.
Changes applied before test
commit 2b869aa7d30d099ed6146d9f8dc667cd7a8eefc3 Author: Antoine Cezar <antoine@cezar.fr> Date: Wed Oct 7 19:45:44 2020 +0200 swh identify: add --exclude
See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/156/ for more details.