This new policy will query chunks of 1000 node at once, in order to fill the Web API rate limit.
Details
Diff Detail
- Repository
- rDTSCN Code scanner
- Branch
- greedy-search
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 22947 Build 35779: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 35778: arc lint + arc unit
Event Timeline
Build is green
Patch application report for D6065 (id=21947)
Rebasing onto 98011a4226...
Current branch diff-target is up to date.
Changes applied before test
commit b18d2b87b54c7005f4185b32ef76e0dfe1306d0a Author: Daniele Serafini <me@danieleserafini.eu> Date: Fri Aug 6 10:46:11 2021 +0200 add policy greedybfs
See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/146/ for more details.
swh/scanner/cli.py | ||
---|---|---|
157–158 | I agree with val, hopefully the suggestion above improved it correctly. |
swh/scanner/policy.py | ||
---|---|---|
166 | nvm, I started writing this before noticing nodes is an iterator, and it looks like I forgot to delete it |
Build is green
Patch application report for D6065 (id=21975)
Rebasing onto 98011a4226...
Current branch diff-target is up to date.
Changes applied before test
commit a25a63fd68311907a8273664ca6bc1c6e0474772 Author: Daniele Serafini <me@danieleserafini.eu> Date: Fri Aug 6 10:46:11 2021 +0200 add policy greedybfs
See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/147/ for more details.
looks good!
I've only added a bunch of minor change requests on docs and comments
swh/scanner/cli.py | ||
---|---|---|
158 | better: "greedybfs: same as "bfs" policy, but lookup the status of source code artifacts in chunks, in order to minimize the number of Web API round-trips with the archive." | |
swh/scanner/policy.py | ||
70 | docstring please, e.g.: "return the size of a source tree as the number of nodes it contains" | |
130 | better: "query graph nodes in chunks (to maximize [...]" rationale: you're not querying node chunks (i.e., node sub-parts), but batches of nodes together | |
156–157 | s/1000/QUERY_LIMIT/ | |
swh/scanner/tests/conftest.py | ||
81 | please add a comment in the source code here, explaining why the recursion limit is needed (you explained it to @vlorentz in the diff, but the diff will not remain in the code :-)) |
swh/scanner/policy.py | ||
---|---|---|
71 | Oh I missed this. You can do it without allocating a list, like this ^ |
Build is green
Patch application report for D6065 (id=21985)
Rebasing onto 98011a4226...
Current branch diff-target is up to date.
Changes applied before test
commit d4a4e6c404bbba289b369dfcba6b2219b0695c8b Author: Daniele Serafini <me@danieleserafini.eu> Date: Fri Aug 6 10:46:11 2021 +0200 add policy greedybfs
See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/148/ for more details.
Thanks!
I'm approving this, but please fix the remaining occurrence of 1000 before landing, as per comment above.
swh/scanner/policy.py | ||
---|---|---|
158 | you missed an occurrence of 1000 here :-), please fix it before landing |
Build is green
Patch application report for D6065 (id=21992)
Rebasing onto 98011a4226...
Current branch diff-target is up to date.
Changes applied before test
commit a9d0b9e2af0e5735608c1841d59e60662874b15e Author: Daniele Serafini <me@danieleserafini.eu> Date: Fri Aug 6 10:46:11 2021 +0200 add policy greedybfs
See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/149/ for more details.