Page MenuHomeSoftware Heritage

swh-scanner: add 'auto' option as default policy
ClosedPublic

Authored by DanSeraf on Jul 27 2021, 2:27 PM.

Details

Summary

In order to reduce the backend requests the scanner will query to the backend all the nodes at once if they are less than the Web API endpoint /known/ size limit.

Diff Detail

Repository
rDTSCN Code scanner
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 22822
Build 35592: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 35591: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D6027 (id=21788)

Rebasing onto d5a070e142...

Current branch diff-target is up to date.
Changes applied before test
commit 79b8db7e0fbb68a02f6782bae487c3fbb7a2fd38
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Tue Jul 27 14:15:20 2021 +0200

    add 'auto' option as default policy
    
    - query all nodes if they fill the backend size limit

See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/142/ for more details.

Could you document the scan policies?

Could you document the scan policies?

Sure, maybe it's better to open another diff for it and add the scan policies description inside the docs (?)

This diff touched the part of the code that needs to be documented, so it's fine to document it in the same diff

This diff touched the part of the code that needs to be documented, so it's fine to document it in the same diff

OK, thanks!

zack requested changes to this revision.Jul 28 2021, 11:08 AM
zack added a subscriber: zack.

I'm requesting some minor changes (+ some other changes to be submitted in a separate diff which I've noticed only now, sorry!).

swh/scanner/policy.py
18
  1. add a docstring for this
  1. we have this information in a number of different places now, we should make sure it is in only one. The other places I've found are:
    • swh/scanner/tests/test_db.py (CHUNK_SIZE)
    • swhswh/scanner/backend.py (LIMIT)
25

You should translate SWHIDs to string as late as possible. So please make (in a separate diff) this function take CoreSWHIDs instances rather than converting them to string when you call this function.

(Sorry, I noticed this only now :-))

273

I noticed it only now, even though it's a minor issue that affect all policies.

Can you please add a docstring to each Policy class that briefly describe what that policy does?

swh/scanner/tests/test_scanner.py
20–21

I'm assuming this is a test on a "small" source tree, which hence should be delegated to the auto policy, which is fine.

However we need also the converse test: one that on a "big" source tree shows that by default the chosen policy is BFS.

This revision now requires changes to proceed.Jul 28 2021, 11:08 AM
  • docstring for each policy
  • remove redundant variables
  • test auto policy for a "big" source tree

Build is green

Patch application report for D6027 (id=21836)

Rebasing onto d5a070e142...

Current branch diff-target is up to date.
Changes applied before test
commit 6041a33f7a4f95b1805f62b29be7667caf5e8271
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Tue Jul 27 14:15:20 2021 +0200

    add 'auto' option as default policy
    
    - query all nodes if they fill the backend size limit

See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/143/ for more details.

This revision is now accepted and ready to land.Jul 29 2021, 10:01 AM

docstring for each policy

That's nice too, but I meant documenting them in the CLI (so users can understand the policies without reading the code)

policies description in CLI

Build is green

Patch application report for D6027 (id=21861)

Rebasing onto d5a070e142...

Current branch diff-target is up to date.
Changes applied before test
commit cd19bbbcee769150494d5b298e06e7a91ef3196c
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Tue Jul 27 14:15:20 2021 +0200

    add 'auto' option as default policy
    
    - query all nodes if they fill the backend size limit
    - docstring for the various policies

See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/144/ for more details.

This revision was automatically updated to reflect the committed changes.