Details

Reviewers

zack

Group Reviewers

Reviewers

Maniphest Tasks

T3420: scanner: make the various query algorithms user-selectable

Commits

rDTSCNd5a070e1429d: add scan policies

Summary

The purpose of this diff is the creation of an interface to create new scan policies.
Some of the policies implemented in the benchmark branch are moved in the master branch in order to let the user select the scan approach (Related to T3420).

Diff Detail

Repository

rDTSCN Code scanner

Branch

policy

Lint

No Linters Available

Unit

No Unit Test Coverage

Build Status

Buildable 22671
Build 35353: Phabricator diff pipeline on jenkins	Jenkins console · Jenkins
Build 35352: arc lint + arc unit

Unit TestsFailed
View All

	Time	Test
	27 ms	Jenkins > .tox.py3.lib.python3.7.site-packages.swh.scanner.tests.test_policy::test_directory_priority_policy live_server = <[AttributeError("'LiveServer' object has no attribute 'url'") raised in repr()] LiveServer object at 0x7f91f360abe0> aiosession = <aiohttp.client.ClientSession object at 0x7f920dd4d550> event_loop = <_UnixSelectorEventLoop running=False closed=False debug=False>
	20 ms	Jenkins > .tox.py3.lib.python3.7.site-packages.swh.scanner.tests.test_policy::test_file_priority_policy live_server = <[AttributeError("'LiveServer' object has no attribute 'url'") raised in repr()] LiveServer object at 0x7f91f360abe0> aiosession = <aiohttp.client.ClientSession object at 0x7f91f359eef0> event_loop = <_UnixSelectorEventLoop running=False closed=False debug=False>
	18 ms	Jenkins > .tox.py3.lib.python3.7.site-packages.swh.scanner.tests.test_policy::test_lazybfs_policy live_server = <[AttributeError("'LiveServer' object has no attribute 'url'") raised in repr()] LiveServer object at 0x7f91f360abe0> aiosession = <aiohttp.client.ClientSession object at 0x7f920dd4d748> event_loop = <_UnixSelectorEventLoop running=False closed=False debug=False>
	136 ms	Jenkins > .tox.py3.lib.python3.7.site-packages.swh.scanner.tests.test_backend::test_backend_endpoint_all_present
	8 ms	Jenkins > .tox.py3.lib.python3.7.site-packages.swh.scanner.tests.test_backend::test_backend_endpoint_one_not_present
		View Full Test Results (3 Failed · 31 Passed)

Event Timeline

DanSeraf created this revision.Jul 15 2021, 7:17 PM

Herald added a reviewer: Reviewers. · View Herald TranscriptJul 15 2021, 7:17 PM

Build is green

Patch application report for D5996 (id=21646)

Rebasing onto 33b1316bcd...

Current branch diff-target is up to date.

Changes applied before test

commit ddfccaea8f616fc806f81050fbd6b89509768ab8
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Thu Jul 15 19:07:33 2021 +0200

    new scan policies

See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/136/ for more details.

Harbormaster completed remote builds in B22617: Diff 21646.Jul 15 2021, 7:20 PM

DanSeraf requested review of this revision.Jul 15 2021, 7:20 PM

Looks great! I've noted down only some nits.

swh/scanner/policy.py
27	"SWHIDs"
115	why is this @no_type_check needed?
125	comment is unclear: "deepest" what? maybe "check deepest node first" ?
149	how come the "known" value can be the empty string here? Ideally it should always be a boolean, either Ture of False. If you need a tristate, maybe None would better?
swh/scanner/tests/test_policy.py
1–5	I'm confused by these two test files: is test_policy.py testing anything at all right now? In test_scanner it looks like you're testing the results with each policy, but not actually the order in which things are queried from the backend (i.e., the policies themselves). Is that correct? If so, we should test the various node ordering here in test_policy.py. It shouldn't be hard on a simple model object example.

This revision now requires changes to proceed.Jul 16 2021, 11:02 AM

DanSeraf added inline comments.Jul 19 2021, 2:00 PM

swh/scanner/policy.py
115	Because when iterating over the merkle tree defined in `swh.model.merkle` mypy expects a `MerkleNode` while actually are subclasses such as `model.from_disk.Directory/Content` that have properties not defined in the base class.
swh/scanner/tests/test_policy.py
1–5	Actually it's not easy to test the order of the nodes queried to the backend since the order could be read only during the request to the backend. If we want to test the behavior of the policies we could just save the known status of the nodes in swh.scanner.data.MerkleNodeInfo. This could be useful for both the unit tests and for the policies to know the status of a certain node. The status can be defined as an enumeration: NodeStatus.UNSET = the known value is not already set NodeStatus.SET = the known value is set by the policy NodeStatus.QUERIED = the known value is set by a policy after a backend request

zack added inline comments.Jul 19 2021, 4:25 PM

swh/scanner/tests/test_policy.py
1–5	Here's what I had in mind to test policies. To test policies you need a swh-scanner backend, which can either be mocked or the real thing that one can spawn during tests. If you are mocking it, you can make it so at each received request, the mock code append to a list the received SWHIDs one by one. At the end of the test session you can access that list and it will give you the order in which SWHIDs have been queried. If you are not mocking the backend, you can either subclass or modify the real backend so that it keeps a log of all queried SWHIDs in an analogous list, which you can check after the scan is done to check the order in it. How does this sound?

test scan policies SWHIDs request to the backend

Build has FAILED

Patch application report for D5996 (id=21702)

Rebasing onto 33b1316bcd...

Current branch diff-target is up to date.

Changes applied before test

commit 879770845b0c592ed1c62acfe60d89879cd9b4f6
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Thu Jul 15 19:07:33 2021 +0200

    new scan policies

Link to build: https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/137/
See console output for more information: https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/137/console

Harbormaster failed remote builds in B22671: Diff 21702!Jul 20 2021, 3:58 PM

make CI pass

Build has FAILED

Patch application report for D5996 (id=21706)

Rebasing onto 33b1316bcd...

Current branch diff-target is up to date.

Changes applied before test

commit 80a42bf1fe71214c7528138d2a3b63940b149d59
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Thu Jul 15 19:07:33 2021 +0200

    new scan policies

Link to build: https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/138/
See console output for more information: https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/138/console

Harbormaster failed remote builds in B22675: Diff 21706!Jul 20 2021, 4:26 PM

make CI pass

Build is green

Patch application report for D5996 (id=21712)

Rebasing onto 33b1316bcd...

Current branch diff-target is up to date.

Changes applied before test

commit f92723a32bc25437f83d18ff16ce510918a6091d
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Thu Jul 15 19:07:33 2021 +0200

    new scan policies

See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/139/ for more details.

Harbormaster completed remote builds in B22681: Diff 21712.Jul 20 2021, 5:45 PM

Wonderful, thanks for adding the order tests! LGTM.

I'm accepting this, but please notice that you overlooked one of the comments in my previous review: it's a minor one, the one about "deepest" wording in a comment.

Please fix/improve it before landing.

This revision is now accepted and ready to land.Jul 21 2021, 9:59 AM

better comment wording

Build is green

Patch application report for D5996 (id=21717)

Rebasing onto 33b1316bcd...

Current branch diff-target is up to date.

Changes applied before test

commit 00c8ad617c92cc0718fb9f00579b524b609eba29
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Thu Jul 15 19:07:33 2021 +0200

    new scan policies

See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/140/ for more details.

Harbormaster completed remote builds in B22684: Diff 21717.Jul 21 2021, 12:19 PM

commit message

Build is green

Patch application report for D5996 (id=21718)

Rebasing onto 33b1316bcd...

Current branch diff-target is up to date.

Changes applied before test

commit d5a070e1429d54fd2679d68b71b19efb65d833dd
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Thu Jul 15 19:07:33 2021 +0200

    add scan policies
    
    - make scan policy selectable through --policy/-p option
    - abstract 'Policy' class to create new scan policies
    - integration of existing policies into the master branch
    
    Closes T3420

See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/141/ for more details.

Harbormaster completed remote builds in B22685: Diff 21718.Jul 21 2021, 12:35 PM

Closed by commit rDTSCNd5a070e1429d: add scan policies (authored by DanSeraf). · Explain WhyJul 21 2021, 2:00 PM

This revision was automatically updated to reflect the committed changes.

DanSeraf added a commit: rDTSCNd5a070e1429d: add scan policies.

swh-scanner: new scan policies
ClosedPublic
Actions

Details

Diff Detail

Unit TestsFailed
View All

Event Timeline

Patch application report for D5996 (id=21646)

Changes applied before test

Patch application report for D5996 (id=21702)

Changes applied before test

Patch application report for D5996 (id=21706)

Changes applied before test

Patch application report for D5996 (id=21712)

Changes applied before test

Patch application report for D5996 (id=21717)

Changes applied before test

Patch application report for D5996 (id=21718)

Changes applied before test

Revision Contents
Changeset List

Diff 21702

swh/scanner/cli.py

swh/scanner/policy.py

swh/scanner/scanner.py

swh/scanner/tests/conftest.py

swh/scanner/tests/flask_api.py

swh/scanner/tests/test_policy.py

swh/scanner/tests/test_scanner.py

swh-scanner: new scan policiesClosedPublicActions

Details

Diff Detail

Unit TestsFailedView All

Event Timeline

Patch application report for D5996 (id=21646)

Changes applied before test

Patch application report for D5996 (id=21702)

Changes applied before test

Patch application report for D5996 (id=21706)

Changes applied before test

Patch application report for D5996 (id=21712)

Changes applied before test

Patch application report for D5996 (id=21717)

Changes applied before test

Patch application report for D5996 (id=21718)

Changes applied before test

Revision ContentsChangeset List

Diff 21702

swh/scanner/cli.py

swh/scanner/policy.py

swh/scanner/scanner.py

swh/scanner/tests/conftest.py

swh/scanner/tests/flask_api.py

swh/scanner/tests/test_policy.py

swh/scanner/tests/test_scanner.py

swh-scanner: new scan policies
ClosedPublic
Actions

Unit TestsFailed
View All

Revision Contents
Changeset List