Page MenuHomeSoftware Heritage

[provenance] Implement a naive archive multiplexer
ClosedPublic

Authored by vsellier on Jun 14 2022, 12:23 PM.

Details

Summary
  • Improve origin layer logs
  • Add a missing dependency
  • Add a new multiplexed archive type

It queries all the underlying archives and returns
the first result found.
It a naive implementation but will allow to have
a fallback the the swh-storage if some is requested
in an outdated swh-graph

Related to T4315

Diff Detail

Repository
rDPROV Provenance database
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 29858
Build 46668: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 46667: arc lint + arc unit

Event Timeline

Build has FAILED

Patch application report for D7985 (id=28772)

Rebasing onto d6e88dabb3...

Current branch diff-target is up to date.
Changes applied before test
commit 827318ce867b4041b5574e700d9e6de986c33674
Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
Date:   Tue Jun 14 12:15:54 2022 +0200

    Add a new multiplexed archive type
    
    It queries all the underlying archives and returns
    the first result found.
    It a naive implementation but will allow to have
    a fallback the the swh-storage if some is requested
    in an outdated swh-graph
    
    Related to T4315

commit f5ed9de87b392c62d59ad00e4f6f5a20a5954db6
Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
Date:   Tue Jun 14 12:21:10 2022 +0200

    Improve origin layer logs

commit d45f066a8c51677cd43f9ffafa25fef47df77368
Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
Date:   Tue Jun 14 12:18:37 2022 +0200

    Declare the missing swh-graph dependency

Link to build: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/600/
See console output for more information: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/600/console

Harbormaster returned this revision to the author for changes because remote builds failed.Jun 14 2022, 12:23 PM
Harbormaster failed remote builds in B29854: Diff 28772!

Build has FAILED

Patch application report for D7985 (id=28775)

Rebasing onto d6e88dabb3...

Current branch diff-target is up to date.
Changes applied before test
commit ad4256e9fd38cac5aaf16c3c1e41759398cd51ab
Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
Date:   Tue Jun 14 12:15:54 2022 +0200

    Add a new multiplexed archive type
    
    It queries all the underlying archives and returns
    the first result found.
    It a naive implementation but will allow to have
    a fallback the the swh-storage if some is requested
    in an outdated swh-graph
    
    Related to T4315

commit f5ed9de87b392c62d59ad00e4f6f5a20a5954db6
Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
Date:   Tue Jun 14 12:21:10 2022 +0200

    Improve origin layer logs

commit d45f066a8c51677cd43f9ffafa25fef47df77368
Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
Date:   Tue Jun 14 12:18:37 2022 +0200

    Declare the missing swh-graph dependency

Link to build: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/601/
See console output for more information: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/601/console

Harbormaster returned this revision to the author for changes because remote builds failed.Jun 14 2022, 2:42 PM
Harbormaster failed remote builds in B29858: Diff 28775!

Build is green

Patch application report for D7985 (id=28775)

Rebasing onto d6e88dabb3...

Current branch diff-target is up to date.
Changes applied before test
commit ad4256e9fd38cac5aaf16c3c1e41759398cd51ab
Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
Date:   Tue Jun 14 12:15:54 2022 +0200

    Add a new multiplexed archive type
    
    It queries all the underlying archives and returns
    the first result found.
    It a naive implementation but will allow to have
    a fallback the the swh-storage if some is requested
    in an outdated swh-graph
    
    Related to T4315

commit f5ed9de87b392c62d59ad00e4f6f5a20a5954db6
Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
Date:   Tue Jun 14 12:21:10 2022 +0200

    Improve origin layer logs

commit d45f066a8c51677cd43f9ffafa25fef47df77368
Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
Date:   Tue Jun 14 12:18:37 2022 +0200

    Declare the missing swh-graph dependency

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/602/ for more details.

you forgot to select the base commit of the diff

I've deliberately created the diff with the 3 commits inside, I just forgot to update the title ;)

vsellier retitled this revision from Declare the missing swh-graph dependency to [provenance] Implement a naive archive multiplexer.Jun 15 2022, 10:57 AM
vsellier edited the summary of this revision. (Show Details)
vsellier added a project: Provenance database.
vlorentz added inline comments.
swh/provenance/model.py
22–33

make sure we don't accidentally return an incorrect value

swh/provenance/multiplexer/archive.py
30

not related to this diff, but we really shouldn't use directory_ls. It's slow because it joins against other tables, but swh-provenance doesn't use any of the values obtained by the join. swh.storage.algos.directory.directory_get should be used instead.

This revision is now accepted and ready to land.Jun 15 2022, 12:30 PM
ardumont added inline comments.
swh/provenance/tests/test_init.py
1

lgtm

(oops, forgot to validate it yesterday ;)

swh/provenance/multiplexer/archive.py
30

Thanks for the suggestion, we need to define what must be improved / developed in provenance, I add this in the list.

Update according the reviews

  • Add and fix license headers
  • Ensure the _revisions_count variable was computed before returning its value

Build is green

Patch application report for D7985 (id=28798)

Rebasing onto d6e88dabb3...

Current branch diff-target is up to date.
Changes applied before test
commit b69c0f7689f0d303eafc4f68c68a59ae418da4e9
Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
Date:   Tue Jun 14 12:15:54 2022 +0200

    Add a new multiplexed archive type
    
    It queries all the underlying archives and returns
    the first result found.
    It a naive implementation but will allow to have
    a fallback the the swh-storage if some is requested
    in an outdated swh-graph
    
    Related to T4315

commit f5ed9de87b392c62d59ad00e4f6f5a20a5954db6
Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
Date:   Tue Jun 14 12:21:10 2022 +0200

    Improve origin layer logs

commit d45f066a8c51677cd43f9ffafa25fef47df77368
Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
Date:   Tue Jun 14 12:18:37 2022 +0200

    Declare the missing swh-graph dependency

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/603/ for more details.

olasd added inline comments.
swh/provenance/multiplexer/archive.py
30

This code does not use the swh.storage implementations, but direct SQL queries implemented in swh.provenance.postgresql.archive.