This new policy will query chunks of 1000 node at once, in order to fill the Web API rate limit.
Details
Diff Detail
- Repository
- rDTSCN Code scanner
- Lint
Automatic diff as part of commit; lint not applicable. - Unit
Automatic diff as part of commit; unit tests not applicable.
Event Timeline
Build is green
Patch application report for D6065 (id=21947)
Rebasing onto 98011a4226...
Current branch diff-target is up to date.
Changes applied before test
commit b18d2b87b54c7005f4185b32ef76e0dfe1306d0a
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Fri Aug 6 10:46:11 2021 +0200
add policy greedybfsSee https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/146/ for more details.
| swh/scanner/cli.py | ||
|---|---|---|
| 157–158 | I agree with val, hopefully the suggestion above improved it correctly. | |
| swh/scanner/policy.py | ||
|---|---|---|
| 166 | nvm, I started writing this before noticing nodes is an iterator, and it looks like I forgot to delete it | |
Build is green
Patch application report for D6065 (id=21975)
Rebasing onto 98011a4226...
Current branch diff-target is up to date.
Changes applied before test
commit a25a63fd68311907a8273664ca6bc1c6e0474772
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Fri Aug 6 10:46:11 2021 +0200
add policy greedybfsSee https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/147/ for more details.
looks good!
I've only added a bunch of minor change requests on docs and comments
| swh/scanner/cli.py | ||
|---|---|---|
| 158 | better: "greedybfs: same as "bfs" policy, but lookup the status of source code artifacts in chunks, in order to minimize the number of Web API round-trips with the archive." | |
| swh/scanner/policy.py | ||
| 70 | docstring please, e.g.: "return the size of a source tree as the number of nodes it contains" | |
| 130 | better: "query graph nodes in chunks (to maximize [...]" rationale: you're not querying node chunks (i.e., node sub-parts), but batches of nodes together | |
| 156–157 | s/1000/QUERY_LIMIT/ | |
| swh/scanner/tests/conftest.py | ||
| 81 | please add a comment in the source code here, explaining why the recursion limit is needed (you explained it to @vlorentz in the diff, but the diff will not remain in the code :-)) | |
| swh/scanner/policy.py | ||
|---|---|---|
| 71 | Oh I missed this. You can do it without allocating a list, like this ^ | |
Build is green
Patch application report for D6065 (id=21985)
Rebasing onto 98011a4226...
Current branch diff-target is up to date.
Changes applied before test
commit d4a4e6c404bbba289b369dfcba6b2219b0695c8b
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Fri Aug 6 10:46:11 2021 +0200
add policy greedybfsSee https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/148/ for more details.
Thanks!
I'm approving this, but please fix the remaining occurrence of 1000 before landing, as per comment above.
| swh/scanner/policy.py | ||
|---|---|---|
| 157 | you missed an occurrence of 1000 here :-), please fix it before landing | |
Build is green
Patch application report for D6065 (id=21992)
Rebasing onto 98011a4226...
Current branch diff-target is up to date.
Changes applied before test
commit a9d0b9e2af0e5735608c1841d59e60662874b15e
Author: Daniele Serafini <me@danieleserafini.eu>
Date: Fri Aug 6 10:46:11 2021 +0200
add policy greedybfsSee https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/149/ for more details.