Page MenuHomeSoftware Heritage

Add partial implementation of `ArchiveGraph` class
ClosedPublic

Authored by aeviso on Jan 20 2022, 4:11 PM.

Details

Summary

Remove ordered result constrain from snapshot_get_heads, as it
is not require anymore after simplifying the origin-revision
layer algorithm.

Diff Detail

Repository
rDPROV Provenance database
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D6990 (id=25349)

Rebasing onto cc7401096d...

Current branch diff-target is up to date.
Changes applied before test
commit 846427ea1ce130e1e3d9fd62c40154ff587bbace
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jan 20 16:08:32 2022 +0100

    Add partial implementation of `ArchiveGraph` class

commit eebf1f7889f1c9072ba8b8c8d0325d151b1ff014
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jan 20 15:35:43 2022 +0100

    Remove ordered result constrain from `snapshot_get_heads`
    
    It is not require anymore after simplifying the origin-revision
    layer algorithm.

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/561/ for more details.

Build is green

Patch application report for D6990 (id=25356)

Could not rebase; Attempt merge onto cc7401096d...

Updating cc74010..ca0f53f
Fast-forward
 requirements-swh.txt                 |  1 +
 swh/provenance/__init__.py           |  9 ++++++-
 swh/provenance/archive.py            |  3 +--
 swh/provenance/postgresql/archive.py | 14 +++++------
 swh/provenance/storage/archive.py    |  2 +-
 swh/provenance/swhgraph/__init__.py  |  0
 swh/provenance/swhgraph/archive.py   | 46 ++++++++++++++++++++++++++++++++++++
 7 files changed, 64 insertions(+), 11 deletions(-)
 create mode 100644 swh/provenance/swhgraph/__init__.py
 create mode 100644 swh/provenance/swhgraph/archive.py
Changes applied before test
commit ca0f53fdc83c7780b158c50d36ba2851e7a6b732
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jan 20 16:08:32 2022 +0100

    Add partial implementation of `ArchiveGraph` class

commit ca8fae444128a1acc1d27976f4154e57ea222e9c
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jan 20 15:35:43 2022 +0100

    Remove ordered result constrain from `snapshot_get_heads`
    
    It is not require anymore after simplifying the origin-revision
    layer algorithm.

commit 3a2f11aadb7d32d1ab8caa5c96d1fa2ea2b5f852
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jan 20 17:51:18 2022 +0100

    Fix direct sql query for directories to the archive
    
    Duplicated entries are now filtered by a `SELECT DISTINCT` clause.

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/564/ for more details.

vlorentz added inline comments.
requirements-swh.txt
3 ↗(On Diff #25356)

it's currently a pretty big dependency, could you make it optional? (see swh-vault as an example)

swh/provenance/swhgraph/archive.py
1 ↗(On Diff #25356)
requirements-swh.txt
3 ↗(On Diff #25356)

Sorry, I don't get this. swh-vault has a requirement on swh-graph as well, it just specifies a minimum version. I don't really know which is the minimum version including the features I'm using here.

swh/provenance/swhgraph/archive.py
1 ↗(On Diff #25356)

Sure. Should I change it in all files?

aeviso marked 2 inline comments as done.
aeviso added inline comments.
requirements-swh.txt
3 ↗(On Diff #25356)

I guess I get what you mean but I'm not sure it is properly done. Please check it if you can

Build has FAILED

Patch application report for D6990 (id=25571)

Could not rebase; Attempt merge onto dfac07f890...

Updating dfac07f..bb008cc
Fast-forward
 MANIFEST.in                              |  4 ++-
 requirements-swh-graph.txt               |  1 +
 setup.py                                 | 10 +++----
 sql/upgrades/003.sql                     | 15 +++++++++++
 swh/provenance/__init__.py               |  9 ++++++-
 swh/provenance/archive.py                |  3 +--
 swh/provenance/postgresql/archive.py     |  2 +-
 swh/provenance/sql/10-superuser-init.sql |  3 +++
 swh/provenance/sql/30-schema.sql         |  6 ++---
 swh/provenance/sql/60-indexes.sql        |  1 +
 swh/provenance/storage/archive.py        |  2 +-
 swh/provenance/swhgraph/__init__.py      |  0
 swh/provenance/swhgraph/archive.py       | 46 ++++++++++++++++++++++++++++++++
 swh/provenance/tests/conftest.py         |  7 +++--
 swh/provenance/tests/test_cli.py         |  3 +++
 15 files changed, 96 insertions(+), 16 deletions(-)
 create mode 100644 requirements-swh-graph.txt
 create mode 100644 sql/upgrades/003.sql
 create mode 100644 swh/provenance/sql/10-superuser-init.sql
 create mode 100644 swh/provenance/swhgraph/__init__.py
 create mode 100644 swh/provenance/swhgraph/archive.py
Changes applied before test
commit bb008ccf96c6d8ca2b7e4fa02afaa4bc621f9d3c
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jan 20 16:08:32 2022 +0100

    Add partial implementation of `ArchiveGraph` class

commit 1a849521e21346014eaaa9c36ed05907035d3521
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jan 20 15:35:43 2022 +0100

    Remove ordered result constrain from `snapshot_get_heads`
    
    It is not require anymore after simplifying the origin-revision
    layer algorithm.

commit bbab4c65c1b6b91cb994e49a198721c604d09c4e
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jan 31 12:56:30 2022 +0100

    Fix unique index definition for path to avoid error with large values

Link to build: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/571/
See console output for more information: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/571/console

Build is green

Patch application report for D6990 (id=25569)

Could not rebase; Attempt merge onto dfac07f890...

Updating dfac07f..9dc02c8
Fast-forward
 MANIFEST.in                              |  4 ++-
 requirements-swh.txt                     |  1 +
 sql/upgrades/003.sql                     | 15 +++++++++++
 swh/provenance/__init__.py               |  9 ++++++-
 swh/provenance/archive.py                |  3 +--
 swh/provenance/postgresql/archive.py     |  2 +-
 swh/provenance/sql/10-superuser-init.sql |  3 +++
 swh/provenance/sql/30-schema.sql         |  6 ++---
 swh/provenance/sql/60-indexes.sql        |  1 +
 swh/provenance/storage/archive.py        |  2 +-
 swh/provenance/swhgraph/__init__.py      |  0
 swh/provenance/swhgraph/archive.py       | 46 ++++++++++++++++++++++++++++++++
 swh/provenance/tests/conftest.py         |  7 +++--
 swh/provenance/tests/test_cli.py         |  3 +++
 14 files changed, 91 insertions(+), 11 deletions(-)
 create mode 100644 sql/upgrades/003.sql
 create mode 100644 swh/provenance/sql/10-superuser-init.sql
 create mode 100644 swh/provenance/swhgraph/__init__.py
 create mode 100644 swh/provenance/swhgraph/archive.py
Changes applied before test
commit 9dc02c881eebbeba9ccb7d1cade03b206232cc9e
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jan 20 16:08:32 2022 +0100

    Add partial implementation of `ArchiveGraph` class

commit 1a849521e21346014eaaa9c36ed05907035d3521
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jan 20 15:35:43 2022 +0100

    Remove ordered result constrain from `snapshot_get_heads`
    
    It is not require anymore after simplifying the origin-revision
    layer algorithm.

commit bbab4c65c1b6b91cb994e49a198721c604d09c4e
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jan 31 12:56:30 2022 +0100

    Fix unique index definition for path to avoid error with large values

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/569/ for more details.

Build is green

Patch application report for D6990 (id=25570)

Could not rebase; Attempt merge onto dfac07f890...

Updating dfac07f..365f0b2
Fast-forward
 MANIFEST.in                              |  4 ++-
 requirements-swh.txt                     |  1 +
 sql/upgrades/003.sql                     | 15 +++++++++++
 swh/provenance/__init__.py               |  9 ++++++-
 swh/provenance/archive.py                |  3 +--
 swh/provenance/postgresql/archive.py     |  2 +-
 swh/provenance/sql/10-superuser-init.sql |  3 +++
 swh/provenance/sql/30-schema.sql         |  6 ++---
 swh/provenance/sql/60-indexes.sql        |  1 +
 swh/provenance/storage/archive.py        |  2 +-
 swh/provenance/swhgraph/__init__.py      |  0
 swh/provenance/swhgraph/archive.py       | 46 ++++++++++++++++++++++++++++++++
 swh/provenance/tests/conftest.py         |  7 +++--
 swh/provenance/tests/test_cli.py         |  3 +++
 14 files changed, 91 insertions(+), 11 deletions(-)
 create mode 100644 sql/upgrades/003.sql
 create mode 100644 swh/provenance/sql/10-superuser-init.sql
 create mode 100644 swh/provenance/swhgraph/__init__.py
 create mode 100644 swh/provenance/swhgraph/archive.py
Changes applied before test
commit 365f0b25b9e5df3c9fb4ade3a389826577717731
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jan 20 16:08:32 2022 +0100

    Add partial implementation of `ArchiveGraph` class

commit 1a849521e21346014eaaa9c36ed05907035d3521
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jan 20 15:35:43 2022 +0100

    Remove ordered result constrain from `snapshot_get_heads`
    
    It is not require anymore after simplifying the origin-revision
    layer algorithm.

commit bbab4c65c1b6b91cb994e49a198721c604d09c4e
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jan 31 12:56:30 2022 +0100

    Fix unique index definition for path to avoid error with large values

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/570/ for more details.

Build is green

Patch application report for D6990 (id=25576)

Rebasing onto 64890b13b9...

Current branch diff-target is up to date.
Changes applied before test
commit 22384d51ce120794b8cba0701e3ab308c2cd1b01
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jan 20 16:08:32 2022 +0100

    Add partial implementation of `ArchiveGraph` class

commit 13a9715781ae1b451678616a7987dac5fb30e3c3
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jan 20 15:35:43 2022 +0100

    Remove ordered result constrain from `snapshot_get_heads`
    
    It is not require anymore after simplifying the origin-revision
    layer algorithm.

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/573/ for more details.

ardumont added inline comments.
swh/provenance/swhgraph/archive.py
1 ↗(On Diff #25356)

only the ones you are changing.

aeviso marked an inline comment as done.

rebase

Build is green

Patch application report for D6990 (id=26646)

Rebasing onto 8feeadaea1...

Current branch diff-target is up to date.
Changes applied before test
commit 64184b859f893d73946597fd3b7c8e41c47529fb
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jan 20 16:08:32 2022 +0100

    Add partial implementation of `ArchiveGraph` class

commit 451518ab25173bbe73711055ccc305a95f3cdb23
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jan 20 15:35:43 2022 +0100

    Remove ordered result constrain from `snapshot_get_heads`
    
    It is not require anymore after simplifying the origin-revision
    layer algorithm.

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/583/ for more details.

Build is green

Patch application report for D6990 (id=26718)

Rebasing onto 8feeadaea1...

Current branch diff-target is up to date.
Changes applied before test
commit 9f7fc135273580a4188b9ed21f34d9214dda7153
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jan 20 16:08:32 2022 +0100

    Add partial implementation of `ArchiveGraph` class

commit 451518ab25173bbe73711055ccc305a95f3cdb23
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jan 20 15:35:43 2022 +0100

    Remove ordered result constrain from `snapshot_get_heads`
    
    It is not require anymore after simplifying the origin-revision
    layer algorithm.

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/584/ for more details.

douardda added inline comments.
requirements-swh.txt
3 ↗(On Diff #25356)

swh-graph is an optional dependency in swh-vault.

Here you make it a "hard" requirement (aka installing swh-provenance will also install swh-graph, which is a big dependency).

Since the swh-graph based implementation of the ArchiveInterface is only one of the 3 possible ones, there is not need to make it a hard requirement, only an optional one.

I'm not sure it is properly done.

that's why he suggested to have a look a swh-vault, since it is an optional dependency there:

requirements-swh.txt
3 ↗(On Diff #25356)

Sorry I missed you already did it properly, thanks!

Just a few more comments below.

setup.py
54 ↗(On Diff #26718)

I've not checked, but maybe the swh-graph dependency should be added in the "testing" extra.

swh/provenance/__init__.py
44 ↗(On Diff #26718)

would be nice to have this import statement in a try/except block to display a proper error message (eg like https://forge.softwareheritage.org/source/swh-vault/browse/master/swh/vault/cli.py$83-93 )

tox.ini
7 ↗(On Diff #26718)

could get rid of these if swh.graph is part of the testing extra.

swh/provenance/swhgraph/archive.py
19 ↗(On Diff #26718)

required by ArchiveInterface

then the interface should be refactored to get rid of the mandatory storage attribute. Then you can get rid of the attribute and the storage constructor argument.

21 ↗(On Diff #26718)

no need for the statsd decorator for this not implemented method

32 ↗(On Diff #26718)

does it make sense to instantiate a CoreSWHID just for this? @vlorentz what do you think?

41 ↗(On Diff #26718)

same as above

aeviso added inline comments.
setup.py
54 ↗(On Diff #26718)

This is done on tox.ini

swh/provenance/swhgraph/archive.py
19 ↗(On Diff #26718)

Probably, but that's beyond the scope of the current dif

21 ↗(On Diff #26718)

It doesn't hurt either, and it's already there for when the method gets actually implemented (when path labels become available in the graph)

32 ↗(On Diff #26718)

I don't see the problem with this. It's just a matter of not duplication the SWHID calculation logic

tox.ini
7 ↗(On Diff #26718)

I followed the way it is done in swh-vault

aeviso marked 5 inline comments as done.

rebase

Build is green

Patch application report for D6990 (id=26760)

Rebasing onto 8feeadaea1...

Current branch diff-target is up to date.
Changes applied before test
commit b99be599fd2ba0afcd2ef346cf7e709af39c4edf
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jan 20 16:08:32 2022 +0100

    Add partial implementation of `ArchiveGraph` class

commit 451518ab25173bbe73711055ccc305a95f3cdb23
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jan 20 15:35:43 2022 +0100

    Remove ordered result constrain from `snapshot_get_heads`
    
    It is not require anymore after simplifying the origin-revision
    layer algorithm.

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/585/ for more details.

setup.py
54 ↗(On Diff #26718)

I agree with david here. the testing extra should be enough to run all tests; and that's how we do it elsewhere.

tox.ini is just the current convenience we use for the main CI, but is not used everywhere.

aeviso added inline comments.
setup.py
54 ↗(On Diff #26718)

OK, then I guess you also want to refactor that in the vault: https://forge.softwareheritage.org/source/swh-vault/browse/master/setup.py$53

aeviso marked an inline comment as done.

rebase

Build is green

Patch application report for D6990 (id=26936)

Rebasing onto 8feeadaea1...

Current branch diff-target is up to date.
Changes applied before test
commit 16723f8be4f10c5f0330418a2a9c186fd306f478
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jan 20 16:08:32 2022 +0100

    Add partial implementation of `ArchiveGraph` class

commit 451518ab25173bbe73711055ccc305a95f3cdb23
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jan 20 15:35:43 2022 +0100

    Remove ordered result constrain from `snapshot_get_heads`
    
    It is not require anymore after simplifying the origin-revision
    layer algorithm.

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/586/ for more details.

This revision is now accepted and ready to land.Mar 29 2022, 11:31 AM
setup.py
54 ↗(On Diff #26718)

no, because swh-vault tests can run without the graph; but swh-provenance's can't.