Page MenuHomeSoftware Heritage

swh-scanner: retrieve additional information about software artifacts
ClosedPublic

Authored by DanSeraf on Aug 18 2021, 7:17 PM.

Details

Summary
  • additional information about software artifacts can be specified using the CLI with the -a/--add option
  • retrieve origin information about software artifacts using swh-graph
  • move backend requests to a separate submodule (client)

Diff Detail

Repository
rDTSCN Code scanner
Branch
provenance-info
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 23103
Build 36031: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 36030: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D6114 (id=22119)

Rebasing onto a9d0b9e2af...

Current branch diff-target is up to date.
Changes applied before test
commit 995ed01d92a7758a984b6700bbdb557a9945fac7
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Wed Aug 18 19:08:38 2021 +0200

    retrieve additional information about software artifacts
    
    - additional information about software artifacts can be specified
    using the CLI with the -a/--add option
    - add support to retrieve origin information using swh-graph
    - move backend requests to a separate submodule (client)

See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/150/ for more details.

vlorentz added inline comments.
swh/scanner/cli.py
150

Could you describe the extra info?

Build is green

Patch application report for D6114 (id=22131)

Rebasing onto a9d0b9e2af...

Current branch diff-target is up to date.
Changes applied before test
commit 5016132575801c709da878567b2acf5a4039f2bf
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Wed Aug 18 19:08:38 2021 +0200

    retrieve additional information about software artifacts
    
    - additional information about software artifacts can be specified
    using the CLI with the -a/--add option
    - add support to retrieve origin information using swh-graph
    - move backend requests to a separate submodule (client)

See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/151/ for more details.

Build is green

Patch application report for D6114 (id=22144)

Rebasing onto a9d0b9e2af...

Current branch diff-target is up to date.
Changes applied before test
commit eb30e9b7bfbfcaade02dc2cfc51e0b74ebbec9ae
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Wed Aug 18 19:08:38 2021 +0200

    retrieve additional information about software artifacts
    
    - additional information about software artifacts can be specified
    using the CLI with the -a/--add option
    - add support to retrieve origin information using swh-graph
    - move backend requests to a separate submodule (client)

See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/152/ for more details.

Build has FAILED

Patch application report for D6114 (id=22145)

Rebasing onto a9d0b9e2af...

Current branch diff-target is up to date.
Changes applied before test
commit def2584d65651da90fad60a3db08d8d6bd33e442
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Wed Aug 18 19:08:38 2021 +0200

    retrieve additional information about software artifacts
    
    - additional information about software artifacts can be specified
    using the CLI with the -a/--add option
    - add support to retrieve origin information using swh-graph
    - move backend requests to a separate submodule (client)

Link to build: https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/153/
See console output for more information: https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/153/console

zack requested changes to this revision.Aug 23 2021, 11:18 AM
zack added a subscriber: zack.

Nice! And I also like the refactoring out of client.py.

I'm requesting mainly changes about the UX and documentation.

The CI will need to pass though :-)

swh/scanner/cli.py
145–147

I see that "-i" is already taken, hence why you haven't used it.
I think that "a" is not very mnemonic though, I propose to use "e" (for extra) info, that is: "-e", "--extra-info", "extra_info)

150

better: "add selected additional information about known software artifacts"

swh/scanner/client.py
5

please add a module docstring, starting that this is a minimal async web client for the Software Heritage Web API

also add in there also a TODO comment stating that this could be removed when T2635 is implemented

(without this info it will not be clear why this module is needed at all)

This revision now requires changes to proceed.Aug 23 2021, 11:18 AM

Build is green

Patch application report for D6114 (id=22146)

Rebasing onto a9d0b9e2af...

Current branch diff-target is up to date.
Changes applied before test
commit 96b3467f2d330b3f010414427843dd654bd6b3cf
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Wed Aug 18 19:08:38 2021 +0200

    retrieve additional information about software artifacts
    
    - additional information about software artifacts can be specified
    using the CLI with the -a/--add option
    - add support to retrieve origin information using swh-graph
    - move backend requests to a separate submodule (client)

See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/154/ for more details.

Build has FAILED

Patch application report for D6114 (id=22147)

Rebasing onto a9d0b9e2af...

Current branch diff-target is up to date.
Changes applied before test
commit efbac760564f4ecba09cafee3ee344cd99cae496
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Wed Aug 18 19:08:38 2021 +0200

    retrieve additional information about software artifacts
    
    - additional information about software artifacts can be specified
    using the CLI with the -e/--extra-info option
    - add support to retrieve origin information using swh-graph
    - move backend requests to a separate submodule (client)

Link to build: https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/155/
See console output for more information: https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/155/console

Build is green

Patch application report for D6114 (id=22147)

Rebasing onto a9d0b9e2af...

Current branch diff-target is up to date.
Changes applied before test
commit efbac760564f4ecba09cafee3ee344cd99cae496
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Wed Aug 18 19:08:38 2021 +0200

    retrieve additional information about software artifacts
    
    - additional information about software artifacts can be specified
    using the CLI with the -e/--extra-info option
    - add support to retrieve origin information using swh-graph
    - move backend requests to a separate submodule (client)

See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/156/ for more details.

LGTM, thanks! But please note that remaining "-a/--add" reference in the new docstring, which should be changed to "-e/..." for consistency. Please fix that before landing this change.

swh/scanner/cli.py
177–178

this should be -e/--extra-info now

This revision is now accepted and ready to land.Aug 24 2021, 5:58 PM

Build is green

Patch application report for D6114 (id=22177)

Rebasing onto a9d0b9e2af...

Current branch diff-target is up to date.
Changes applied before test
commit 979d7c803a1478c1e65a6cf8a827c16a746e3aa1
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Wed Aug 18 19:08:38 2021 +0200

    retrieve additional information about software artifacts
    
    - additional information about software artifacts can be specified
    using the CLI with the -e/--extra-info option
    - add support to retrieve origin information using swh-graph
    - move backend requests to a separate submodule (client)

See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/157/ for more details.