Details

Reviewers

vlorentz
zack

Group Reviewers

Reviewers

Commits

rDTSCNa9d0b9e2af0e: add policy greedybfs

Summary

This new policy will query chunks of 1000 node at once, in order to fill the Web API rate limit.

Diff Detail

Repository

rDTSCN Code scanner

Branch

greedy-search

Lint

No Linters Available

Unit

No Unit Test Coverage

Build Status

Buildable 22947
Build 35779: Phabricator diff pipeline on jenkins	Jenkins console · Jenkins
Build 35778: arc lint + arc unit

Event Timeline

DanSeraf created this revision.Aug 6 2021, 10:47 AM

Herald added a reviewer: Reviewers. · View Herald TranscriptAug 6 2021, 10:47 AM

Build is green

Patch application report for D6065 (id=21947)

Rebasing onto 98011a4226...

Current branch diff-target is up to date.

Changes applied before test

commit b18d2b87b54c7005f4185b32ef76e0dfe1306d0a
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Fri Aug 6 10:46:11 2021 +0200

    add policy greedybfs

See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/146/ for more details.

Harbormaster completed remote builds in B22910: Diff 21947.Aug 6 2021, 10:50 AM

DanSeraf requested review of this revision.Aug 6 2021, 10:50 AM

vlorentz added a subscriber: vlorentz.Aug 6 2021, 1:03 PM

vlorentz added inline comments.

swh/scanner/cli.py
157–158	I find "fill" a bit unclear. Do you mean using it to the max?
swh/scanner/policy.py
162–167
166	Doesn't this always take the
170–172	purely aesthetic (to prevent black from reformatting it with the weird newline placement)
swh/scanner/tests/conftest.py
81	why?

ardumont added a subscriber: ardumont.Aug 6 2021, 2:32 PM

ardumont added inline comments.

swh/scanner/cli.py
157–158	I agree with val, hopefully the suggestion above improved it correctly.

DanSeraf added inline comments.Aug 6 2021, 3:55 PM

swh/scanner/policy.py
166	What do you mean?
170–172	Ok, thanks for the suggestion!
swh/scanner/tests/conftest.py
81	This is used to avoid a possible RecursionError that could be generated by `.mkdir`.

vlorentz added inline comments.Aug 6 2021, 4:22 PM

swh/scanner/policy.py
166	nvm, I started writing this before noticing `nodes` is an iterator, and it looks like I forgot to delete it

requested changes

Build is green

Patch application report for D6065 (id=21975)

Rebasing onto 98011a4226...

Current branch diff-target is up to date.

Changes applied before test

commit a25a63fd68311907a8273664ca6bc1c6e0474772
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Fri Aug 6 10:46:11 2021 +0200

    add policy greedybfs

See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/147/ for more details.

Harbormaster completed remote builds in B22937: Diff 21975.Aug 6 2021, 5:05 PM

vlorentz accepted this revision.Aug 9 2021, 12:59 PM

This revision is now accepted and ready to land.Aug 9 2021, 12:59 PM

zack added a reviewer: zack.Aug 9 2021, 1:44 PM

This revision now requires review to proceed.Aug 9 2021, 1:44 PM

looks good!

I've only added a bunch of minor change requests on docs and comments

swh/scanner/cli.py
158	better: "greedybfs: same as "bfs" policy, but lookup the status of source code artifacts in chunks, in order to minimize the number of Web API round-trips with the archive."
swh/scanner/policy.py
70	docstring please, e.g.: "return the size of a source tree as the number of nodes it contains"
130	better: "query graph nodes in chunks (to maximize [...]" rationale: you're not querying node chunks (i.e., node sub-parts), but batches of nodes together
156–157	s/1000/QUERY_LIMIT/
swh/scanner/tests/conftest.py
81	please add a comment in the source code here, explaining why the recursion limit is needed (you explained it to @vlorentz in the diff, but the diff will not remain in the code :-))

This revision now requires changes to proceed.Aug 9 2021, 1:56 PM

vlorentz added inline comments.Aug 9 2021, 2:00 PM

swh/scanner/policy.py
71	Oh I missed this. You can do it without allocating a list, like this ^

requested changes

Build is green

Patch application report for D6065 (id=21985)

Rebasing onto 98011a4226...

Current branch diff-target is up to date.

Changes applied before test

commit d4a4e6c404bbba289b369dfcba6b2219b0695c8b
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Fri Aug 6 10:46:11 2021 +0200

    add policy greedybfs

See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/148/ for more details.

Harbormaster completed remote builds in B22947: Diff 21985.Aug 9 2021, 5:40 PM

Thanks!

I'm approving this, but please fix the remaining occurrence of 1000 before landing, as per comment above.

swh/scanner/policy.py
158	you missed an occurrence of `1000` here :-), please fix it before landing

This revision is now accepted and ready to land.Aug 9 2021, 7:04 PM

missing occurrence

Build is green

Patch application report for D6065 (id=21992)

Rebasing onto 98011a4226...

Current branch diff-target is up to date.

Changes applied before test

commit a9d0b9e2af0e5735608c1841d59e60662874b15e
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Fri Aug 6 10:46:11 2021 +0200

    add policy greedybfs

See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/149/ for more details.

Harbormaster completed remote builds in B22955: Diff 21992.Aug 10 2021, 1:19 PM

Closed by commit rDTSCNa9d0b9e2af0e: add policy greedybfs (authored by DanSeraf). · Explain WhyAug 10 2021, 1:20 PM

This revision was automatically updated to reflect the committed changes.

DanSeraf added a commit: rDTSCNa9d0b9e2af0e: add policy greedybfs.

swh-scanner: add policy greedybfs
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Patch application report for D6065 (id=21947)

Changes applied before test

Patch application report for D6065 (id=21975)

Changes applied before test

Patch application report for D6065 (id=21985)

Changes applied before test

Patch application report for D6065 (id=21992)

Changes applied before test

Revision Contents
Changeset List

Diff 21985

swh/scanner/cli.py

swh/scanner/policy.py

swh/scanner/scanner.py

swh/scanner/tests/conftest.py

swh/scanner/tests/flask_api.py

swh/scanner/tests/test_policy.py

swh-scanner: add policy greedybfsClosedPublicActions

Details

Diff Detail

Event Timeline

Patch application report for D6065 (id=21947)

Changes applied before test

Patch application report for D6065 (id=21975)

Changes applied before test

Patch application report for D6065 (id=21985)

Changes applied before test

Patch application report for D6065 (id=21992)

Changes applied before test

Revision ContentsChangeset List

Diff 21985

swh/scanner/cli.py

swh/scanner/policy.py

swh/scanner/scanner.py

swh/scanner/tests/conftest.py

swh/scanner/tests/flask_api.py

swh/scanner/tests/test_policy.py

swh-scanner: add policy greedybfs
ClosedPublic
Actions

Revision Contents
Changeset List