Page MenuHomeSoftware Heritage

Improve `origin_add` logic
ClosedPublic

Authored by aeviso on Feb 11 2022, 9:02 PM.

Details

Summary

to avoid processing twice the history graph of a revision that was
already seen as the head of another origin.

Diff Detail

Repository
rDPROV Provenance database
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D7166 (id=25973)

Could not rebase; Attempt merge onto bb800f80ed...

Updating bb800f8..9df61c2
Fast-forward
 requirements-swh-graph.txt           |  1 +
 setup.py                             | 10 ++++----
 swh/provenance/__init__.py           |  9 ++++++-
 swh/provenance/archive.py            |  3 +--
 swh/provenance/interface.py          |  4 ++++
 swh/provenance/origin.py             | 16 +++++++------
 swh/provenance/postgresql/archive.py |  2 +-
 swh/provenance/provenance.py         |  3 +++
 swh/provenance/storage/archive.py    |  2 +-
 swh/provenance/swhgraph/__init__.py  |  0
 swh/provenance/swhgraph/archive.py   | 46 ++++++++++++++++++++++++++++++++++++
 tox.ini                              |  4 ++++
 12 files changed, 83 insertions(+), 17 deletions(-)
 create mode 100644 requirements-swh-graph.txt
 create mode 100644 swh/provenance/swhgraph/__init__.py
 create mode 100644 swh/provenance/swhgraph/archive.py
Changes applied before test
commit 9df61c25de6509881078cbcd502a947ca3612862
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Fri Feb 11 20:23:35 2022 +0100

    Improve `origin_add` logic
    
    to avoid processing twice the history graph of a revision that was
    already seen as the head of another origin.

commit 2833438c3aaa44ec34c3a125cb7fad2739b51b6c
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jan 20 16:08:32 2022 +0100

    Add partial implementation of `ArchiveGraph` class

commit 45c5b78ba0c61b9bf3e37ea10408cf6de26a5071
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jan 20 15:35:43 2022 +0100

    Remove ordered result constrain from `snapshot_get_heads`
    
    It is not require anymore after simplifying the origin-revision
    layer algorithm.

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/579/ for more details.

ardumont added inline comments.
swh/provenance/origin.py
53 ↗(On Diff #25973)

This "commit" parameter change looks a bit strange.

Either that's really needed (to prevent flushing from time to time) and then that should be opened in the interface above.
Either that's not really needed and then you can keep the old behavior about this.

Either way, that's also looks like a change that could be in a separate diff.

What do you think?

swh/provenance/origin.py
53 ↗(On Diff #25973)

I don't really follow what you mean by "should be opened in the interface above". The commit parameter is used when testing to simulate the (partial) processing of batches. It's like that for the other methods in the "interface" as well: ie. directory_add and revision_add

olasd added a subscriber: olasd.

Looks fine, thanks!

This revision is now accepted and ready to land.Mar 17 2022, 1:47 PM

Build has FAILED

Patch application report for D7166 (id=26640)

Rebasing onto bb800f80ed...

Current branch diff-target is up to date.
Changes applied before test
commit 80071ce5d92c51487e191a8890a43cd42123a16d
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Fri Feb 11 20:23:35 2022 +0100

    Improve `origin_add` logic
    
    to avoid processing twice the history graph of a revision that was
    already seen as the head of another origin.

Link to build: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/580/
See console output for more information: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/580/console

Build has FAILED

Patch application report for D7166 (id=26640)

Rebasing onto bb800f80ed...

Current branch diff-target is up to date.
Changes applied before test
commit 80071ce5d92c51487e191a8890a43cd42123a16d
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Fri Feb 11 20:23:35 2022 +0100

    Improve `origin_add` logic
    
    to avoid processing twice the history graph of a revision that was
    already seen as the head of another origin.

Link to build: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/581/
See console output for more information: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/581/console

swh/provenance/origin.py
53 ↗(On Diff #25973)

I'm the one who does not get it.
Why are the interface and the function 'origin_add' signatures different?

Also, opening specific code for tests is usually not a good thing.
Anyway, i won't push for this, that comment is too old and olasd already accepted the diff nonetheless so fine.

  • Fix tests that broke after swh-environment update
  • Improve origin_add logic
aeviso added inline comments.
swh/provenance/origin.py
53 ↗(On Diff #25973)

Why are the interface and the function 'origin_add' signatures different?

I guess by interface here you mean the CLI. This method is the entry point for the origin-revision layer ingestion command, but provenance and archive are resolved based on the configuration file by the CLI function itself. The extra commit parameter has testing purposes.

Also, opening specific code for tests is usually not a good thing.

I agree, but it was designed that way so I'm just making it compliant with the other of the methods (directory_add and revision_add).

Build is green

Patch application report for D7166 (id=26642)

Rebasing onto bb800f80ed...

Current branch diff-target is up to date.
Changes applied before test
commit 8feeadaea1306edb67b7b241f0fcf1f37f9570e2
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Fri Feb 11 20:23:35 2022 +0100

    Improve `origin_add` logic
    
    to avoid processing twice the history graph of a revision that was
    already seen as the head of another origin.

commit b0fccc251c5e59d020e9c0f9953726e102c09891
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Mar 17 14:54:29 2022 +0100

    Fix tests that broke after `swh-environment` update

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/582/ for more details.

This revision was automatically updated to reflect the committed changes.
aeviso marked an inline comment as done.