The purpose of this diff is the creation of an interface to create new scan policies.
Some of the policies implemented in the benchmark branch are moved in the master branch in order to let the user select the scan approach (Related to T3420).
Details
- Reviewers
zack - Group Reviewers
Reviewers - Maniphest Tasks
- T3420: scanner: make the various query algorithms user-selectable
- Commits
- rDTSCNd5a070e1429d: add scan policies
Diff Detail
- Repository
- rDTSCN Code scanner
- Branch
- policy
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 22617 Build 35255: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 35254: arc lint + arc unit
Event Timeline
Build is green
Patch application report for D5996 (id=21646)
Rebasing onto 33b1316bcd...
Current branch diff-target is up to date.
Changes applied before test
commit ddfccaea8f616fc806f81050fbd6b89509768ab8 Author: Daniele Serafini <me@danieleserafini.eu> Date: Thu Jul 15 19:07:33 2021 +0200 new scan policies
See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/136/ for more details.
Looks great! I've noted down only some nits.
swh/scanner/policy.py | ||
---|---|---|
27 | "SWHIDs" | |
115 | why is this @no_type_check needed? | |
125 | comment is unclear: "deepest" what? maybe "check deepest node first" ? | |
149 | how come the "known" value can be the empty string here? Ideally it should always be a boolean, either Ture of False. If you need a tristate, maybe None would better? | |
swh/scanner/tests/test_policy.py | ||
1–5 | I'm confused by these two test files: is test_policy.py testing anything at all right now? In test_scanner it looks like you're testing the results with each policy, but not actually the order in which things are queried from the backend (i.e., the policies themselves). Is that correct? If so, we should test the various node ordering here in test_policy.py. It shouldn't be hard on a simple model object example. |
swh/scanner/policy.py | ||
---|---|---|
115 | Because when iterating over the merkle tree defined in swh.model.merkle mypy expects a MerkleNode while actually are subclasses such as model.from_disk.Directory/Content that have properties not defined in the base class. | |
swh/scanner/tests/test_policy.py | ||
1–5 | Actually it's not easy to test the order of the nodes queried to the backend since the order could be read only during the request to the backend. If we want to test the behavior of the policies we could just save the known status of the nodes in swh.scanner.data.MerkleNodeInfo. This could be useful for both the unit tests and for the policies to know the status of a certain node. The status can be defined as an enumeration:
|
swh/scanner/tests/test_policy.py | ||
---|---|---|
1–5 | Here's what I had in mind to test policies. To test policies you need a swh-scanner backend, which can either be mocked or the real thing that one can spawn during tests. If you are mocking it, you can make it so at each received request, the mock code append to a list the received SWHIDs one by one. At the end of the test session you can access that list and it will give you the order in which SWHIDs have been queried. If you are not mocking the backend, you can either subclass or modify the real backend so that it keeps a log of all queried SWHIDs in an analogous list, which you can check after the scan is done to check the order in it. How does this sound? |
Build has FAILED
Patch application report for D5996 (id=21702)
Rebasing onto 33b1316bcd...
Current branch diff-target is up to date.
Changes applied before test
commit 879770845b0c592ed1c62acfe60d89879cd9b4f6 Author: Daniele Serafini <me@danieleserafini.eu> Date: Thu Jul 15 19:07:33 2021 +0200 new scan policies
Link to build: https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/137/
See console output for more information: https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/137/console
Build has FAILED
Patch application report for D5996 (id=21706)
Rebasing onto 33b1316bcd...
Current branch diff-target is up to date.
Changes applied before test
commit 80a42bf1fe71214c7528138d2a3b63940b149d59 Author: Daniele Serafini <me@danieleserafini.eu> Date: Thu Jul 15 19:07:33 2021 +0200 new scan policies
Link to build: https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/138/
See console output for more information: https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/138/console
Build is green
Patch application report for D5996 (id=21712)
Rebasing onto 33b1316bcd...
Current branch diff-target is up to date.
Changes applied before test
commit f92723a32bc25437f83d18ff16ce510918a6091d Author: Daniele Serafini <me@danieleserafini.eu> Date: Thu Jul 15 19:07:33 2021 +0200 new scan policies
See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/139/ for more details.
Wonderful, thanks for adding the order tests! LGTM.
I'm accepting this, but please notice that you overlooked one of the comments in my previous review: it's a minor one, the one about "deepest" wording in a comment.
Please fix/improve it before landing.
Build is green
Patch application report for D5996 (id=21717)
Rebasing onto 33b1316bcd...
Current branch diff-target is up to date.
Changes applied before test
commit 00c8ad617c92cc0718fb9f00579b524b609eba29 Author: Daniele Serafini <me@danieleserafini.eu> Date: Thu Jul 15 19:07:33 2021 +0200 new scan policies
See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/140/ for more details.
Build is green
Patch application report for D5996 (id=21718)
Rebasing onto 33b1316bcd...
Current branch diff-target is up to date.
Changes applied before test
commit d5a070e1429d54fd2679d68b71b19efb65d833dd Author: Daniele Serafini <me@danieleserafini.eu> Date: Thu Jul 15 19:07:33 2021 +0200 add scan policies - make scan policy selectable through --policy/-p option - abstract 'Policy' class to create new scan policies - integration of existing policies into the master branch Closes T3420
See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/141/ for more details.