Page MenuHomeSoftware Heritage

Rework `ProvenanceInterface` as discussed during backend design
ClosedPublic

Authored by aeviso on Jun 29 2021, 12:50 PM.

Details

Summary

Add ProvenanceResult class to be returned by content_find_first and
content_find_all methods. Rename some methods. Improve type annotations.

Move ProvenanceBackend implementation to a separate file

Depends on D5944

Diff Detail

Event Timeline

Build is green

Patch application report for D5946 (id=21341)

Could not rebase; Attempt merge onto d892b29e40...

Updating d892b29..23184e7
Fast-forward
 swh/provenance/__init__.py                         |   2 +-
 swh/provenance/archive.py                          |  24 +--
 swh/provenance/backend.py                          | 211 +++++++++++++++++++
 swh/provenance/graph.py                            |   4 +-
 swh/provenance/model.py                            |  53 ++---
 swh/provenance/origin.py                           |  21 +-
 swh/provenance/postgresql/archive.py               | 118 ++++-------
 swh/provenance/postgresql/provenancedb_base.py     | 126 +++++++----
 .../postgresql/provenancedb_with_path.py           |  63 +++---
 .../postgresql/provenancedb_without_path.py        |  59 +++---
 swh/provenance/provenance.py                       | 230 +++------------------
 swh/provenance/revision.py                         |  13 +-
 swh/provenance/sql/30-schema.sql                   |   4 +-
 swh/provenance/storage/archive.py                  |  30 +--
 swh/provenance/tests/conftest.py                   |  24 ++-
 .../tests/data/generate_storage_from_git.py        |   3 +-
 .../data/history_graphs_with-merges_visits-01.yaml |  55 +++++
 swh/provenance/tests/data/with-merges.msgpack      | Bin 0 -> 7501 bytes
 ...repo_with_merges.yaml => with-merges_repo.yaml} |   0
 ...s-visits-01.yaml => with-merges_visits-01.yaml} |   0
 swh/provenance/tests/test_archive_interface.py     |  51 +++++
 swh/provenance/tests/test_conftest.py              |   2 +-
 swh/provenance/tests/test_history_graph.py         |  62 ++++++
 swh/provenance/tests/test_origin_iterator.py       |   8 +-
 swh/provenance/tests/test_provenance_db.py         |   4 +-
 swh/provenance/tests/test_provenance_heuristics.py |  30 +--
 26 files changed, 724 insertions(+), 473 deletions(-)
 create mode 100644 swh/provenance/backend.py
 create mode 100644 swh/provenance/tests/data/history_graphs_with-merges_visits-01.yaml
 create mode 100644 swh/provenance/tests/data/with-merges.msgpack
 rename swh/provenance/tests/data/{repo_with_merges.yaml => with-merges_repo.yaml} (100%)
 rename swh/provenance/tests/data/{repo_with_merges-visits-01.yaml => with-merges_visits-01.yaml} (100%)
 create mode 100644 swh/provenance/tests/test_archive_interface.py
 create mode 100644 swh/provenance/tests/test_history_graph.py
Changes applied before test
commit 23184e7de91d7e60577ce730868098b91a72b1d1
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 28 14:37:50 2021 +0200

    Move `ProvenanceBackend` implementation to a separate file

commit f32475952907452f3dbe3d51be9433aa854413bf
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 28 14:28:32 2021 +0200

    Rework `ProvenanceInterface` as discussed during backend design
    
    Add `ProvenanceResult` class to be returned by `content_find_first` and
    `content_find_all` methods. Rename some methods. Improve type annotations.

commit ad860db9bfeff7f276b3e356c9e21cb57cafc4c2
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:31:16 2021 +0200

    Add tests for history graph topology

commit 37ac81faf15a32c4471a3c4ee5140bcb9bf57178
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:10:38 2021 +0200

    Fix database queries related to the origin-revision layer
    
    This required allowing null dates in the `revision` table so that revision can be added
    by the origin-revision layer algorithm but not recognized as already processed by the
    revision-content layer. Revision and origin entries are now inserted in the database
    prior to inserting rows to revision_in_origin and revision_before_revision relations,
    so that internal ids are properly resolved.

commit 4eb166cc4f2aa036c932b9a5eb462454a70ee0d9
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Fri Jun 25 13:38:26 2021 +0200

    Add test to compare both ArchiveInterface implementations
    
    Improve documentation of the interface and complete pending TODO's.

commit 01ac9eea375258ac1e000389d3fd286d0dbae458
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:25:15 2021 +0200

    Rename test files to keep naming convension
    
    Also added missing .msgpack file dump for new with-merges repository.

commit 76d1560924251396c1ac63c286d8612ce0f7e9d9
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:05:24 2021 +0200

    Refactor ArchiveInterface to fit origin-revision layer needs
    
    Replace `revision_get` method by `revision_get_parents` returning an iterable of
    parents' ids only, instead of a swh.model.model.Revision object.

commit df69a9e57692ed9d4d870c295a21b3ac187d7b9c
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 23 20:00:40 2021 +0200

    Use `Sha1Git` type to explicitly state the kind of identifiers
    
    Previous occurrences of `bytes` and `Sha1` are not correctly using `Sha1Git`.
    Also, some bytes conversion methods were replaced by their counterparts in
    the swh.model.hashutil module.

commit fa22dc902781e30e46823030681f003983cc6d6e
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 23 19:12:06 2021 +0200

    Add support for sha1 identifiers for origins

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/195/ for more details.

Build is green

Patch application report for D5946 (id=21352)

Could not rebase; Attempt merge onto d892b29e40...

Updating d892b29..3611991
Fast-forward
 swh/provenance/__init__.py                         |   2 +-
 swh/provenance/archive.py                          |  24 +-
 swh/provenance/backend.py                          | 211 ++++++++++++++++
 swh/provenance/graph.py                            |   4 +-
 swh/provenance/model.py                            |  53 ++--
 swh/provenance/origin.py                           |  21 +-
 swh/provenance/postgresql/archive.py               | 118 ++++-----
 swh/provenance/postgresql/provenancedb_base.py     | 126 +++++++---
 .../postgresql/provenancedb_with_path.py           |  63 +++--
 .../postgresql/provenancedb_without_path.py        |  59 +++--
 swh/provenance/provenance.py                       | 271 ++++++---------------
 swh/provenance/revision.py                         |  13 +-
 swh/provenance/sql/30-schema.sql                   |   4 +-
 swh/provenance/storage/archive.py                  |  30 ++-
 swh/provenance/tests/conftest.py                   |  24 +-
 .../tests/data/generate_storage_from_git.py        |   3 +-
 .../data/history_graphs_with-merges_visits-01.yaml |  55 +++++
 swh/provenance/tests/data/with-merges.msgpack      | Bin 0 -> 7501 bytes
 ...repo_with_merges.yaml => with-merges_repo.yaml} |   0
 ...s-visits-01.yaml => with-merges_visits-01.yaml} |   0
 swh/provenance/tests/test_archive_interface.py     |  51 ++++
 swh/provenance/tests/test_conftest.py              |   2 +-
 swh/provenance/tests/test_history_graph.py         |  62 +++++
 swh/provenance/tests/test_origin_iterator.py       |   8 +-
 swh/provenance/tests/test_provenance_db.py         |   4 +-
 swh/provenance/tests/test_provenance_heuristics.py |  30 ++-
 26 files changed, 765 insertions(+), 473 deletions(-)
 create mode 100644 swh/provenance/backend.py
 create mode 100644 swh/provenance/tests/data/history_graphs_with-merges_visits-01.yaml
 create mode 100644 swh/provenance/tests/data/with-merges.msgpack
 rename swh/provenance/tests/data/{repo_with_merges.yaml => with-merges_repo.yaml} (100%)
 rename swh/provenance/tests/data/{repo_with_merges-visits-01.yaml => with-merges_visits-01.yaml} (100%)
 create mode 100644 swh/provenance/tests/test_archive_interface.py
 create mode 100644 swh/provenance/tests/test_history_graph.py
Changes applied before test
commit 361199109d7d5a6cb694685cb2062940abe814bb
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 28 14:37:50 2021 +0200

    Move `ProvenanceBackend` implementation to a separate file

commit d058de2c080ee0c79ae57131d5c8ebdbeb6d0486
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 28 14:28:32 2021 +0200

    Rework `ProvenanceInterface` as discussed during backend design
    
    Add `ProvenanceResult` class to be returned by `content_find_first` and
    `content_find_all` methods. Rename some methods. Improve type annotations.

commit ad860db9bfeff7f276b3e356c9e21cb57cafc4c2
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:31:16 2021 +0200

    Add tests for history graph topology

commit 37ac81faf15a32c4471a3c4ee5140bcb9bf57178
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:10:38 2021 +0200

    Fix database queries related to the origin-revision layer
    
    This required allowing null dates in the `revision` table so that revision can be added
    by the origin-revision layer algorithm but not recognized as already processed by the
    revision-content layer. Revision and origin entries are now inserted in the database
    prior to inserting rows to revision_in_origin and revision_before_revision relations,
    so that internal ids are properly resolved.

commit 4eb166cc4f2aa036c932b9a5eb462454a70ee0d9
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Fri Jun 25 13:38:26 2021 +0200

    Add test to compare both ArchiveInterface implementations
    
    Improve documentation of the interface and complete pending TODO's.

commit 01ac9eea375258ac1e000389d3fd286d0dbae458
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:25:15 2021 +0200

    Rename test files to keep naming convension
    
    Also added missing .msgpack file dump for new with-merges repository.

commit 76d1560924251396c1ac63c286d8612ce0f7e9d9
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:05:24 2021 +0200

    Refactor ArchiveInterface to fit origin-revision layer needs
    
    Replace `revision_get` method by `revision_get_parents` returning an iterable of
    parents' ids only, instead of a swh.model.model.Revision object.

commit df69a9e57692ed9d4d870c295a21b3ac187d7b9c
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 23 20:00:40 2021 +0200

    Use `Sha1Git` type to explicitly state the kind of identifiers
    
    Previous occurrences of `bytes` and `Sha1` are not correctly using `Sha1Git`.
    Also, some bytes conversion methods were replaced by their counterparts in
    the swh.model.hashutil module.

commit fa22dc902781e30e46823030681f003983cc6d6e
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 23 19:12:06 2021 +0200

    Add support for sha1 identifiers for origins

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/198/ for more details.

Build is green

Patch application report for D5946 (id=21369)

Could not rebase; Attempt merge onto d892b29e40...

Updating d892b29..e25122d
Fast-forward
 swh/provenance/__init__.py                         |   2 +-
 swh/provenance/archive.py                          |  24 +-
 swh/provenance/backend.py                          | 211 ++++++++++++++++
 swh/provenance/graph.py                            |   4 +-
 swh/provenance/model.py                            |  53 ++--
 swh/provenance/origin.py                           |  21 +-
 swh/provenance/postgresql/archive.py               | 118 ++++-----
 swh/provenance/postgresql/provenancedb_base.py     | 126 +++++++---
 .../postgresql/provenancedb_with_path.py           |  63 +++--
 .../postgresql/provenancedb_without_path.py        |  59 +++--
 swh/provenance/provenance.py                       | 274 ++++++---------------
 swh/provenance/revision.py                         |  13 +-
 swh/provenance/sql/30-schema.sql                   |   4 +-
 swh/provenance/storage/archive.py                  |  30 ++-
 swh/provenance/tests/conftest.py                   |  24 +-
 .../tests/data/generate_storage_from_git.py        |   3 +-
 .../data/history_graphs_with-merges_visits-01.yaml |  55 +++++
 swh/provenance/tests/data/with-merges.msgpack      | Bin 0 -> 7501 bytes
 ...repo_with_merges.yaml => with-merges_repo.yaml} |   0
 ...s-visits-01.yaml => with-merges_visits-01.yaml} |   0
 swh/provenance/tests/test_archive_interface.py     |  51 ++++
 swh/provenance/tests/test_conftest.py              |   2 +-
 swh/provenance/tests/test_history_graph.py         |  62 +++++
 swh/provenance/tests/test_origin_iterator.py       |   8 +-
 swh/provenance/tests/test_provenance_db.py         |   4 +-
 swh/provenance/tests/test_provenance_heuristics.py |  30 ++-
 26 files changed, 768 insertions(+), 473 deletions(-)
 create mode 100644 swh/provenance/backend.py
 create mode 100644 swh/provenance/tests/data/history_graphs_with-merges_visits-01.yaml
 create mode 100644 swh/provenance/tests/data/with-merges.msgpack
 rename swh/provenance/tests/data/{repo_with_merges.yaml => with-merges_repo.yaml} (100%)
 rename swh/provenance/tests/data/{repo_with_merges-visits-01.yaml => with-merges_visits-01.yaml} (100%)
 create mode 100644 swh/provenance/tests/test_archive_interface.py
 create mode 100644 swh/provenance/tests/test_history_graph.py
Changes applied before test
commit e25122d2e47de942a772164e9f1a60f425c87d97
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 28 14:37:50 2021 +0200

    Move `ProvenanceBackend` implementation to a separate file

commit b7678a341da72587cc48848f5a72f65861f892af
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 28 14:28:32 2021 +0200

    Rework `ProvenanceInterface` as discussed during backend design
    
    Add `ProvenanceResult` class to be returned by `content_find_first` and
    `content_find_all` methods. Rename some methods. Improve type annotations.

commit 7a59ff712bb8b5ae22e6f016475d03317c27b64a
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:31:16 2021 +0200

    Add tests for history graph topology

commit 3171ae2f129df433689fd22e32c8eeebf7af4171
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:10:38 2021 +0200

    Fix database queries related to the origin-revision layer
    
    This required allowing null dates in the `revision` table so that revision can be added
    by the origin-revision layer algorithm but not recognized as already processed by the
    revision-content layer. Revision and origin entries are now inserted in the database
    prior to inserting rows to revision_in_origin and revision_before_revision relations,
    so that internal ids are properly resolved.

commit 6736f6068280f167df5616681dee9ad67b2b7dbd
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Fri Jun 25 13:38:26 2021 +0200

    Add test to compare both `ArchiveInterface` implementations
    
    Improve documentation of the interface and complete pending TODO's.

commit dde867254e51dd87f4aba3cdea59da8bffc2d160
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:25:15 2021 +0200

    Rename test files to keep naming convension
    
    Also added missing .msgpack file dump for new with-merges repository.

commit 14001c1844598a3d4ebd1b5f609070f9c85dcaa9
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:05:24 2021 +0200

    Refactor `ArchiveInterface` to fit origin-revision layer needs
    
    Replace `revision_get` method by `revision_get_parents` returning an iterable of
    parents' ids only, instead of a swh.model.model.Revision object.

commit df69a9e57692ed9d4d870c295a21b3ac187d7b9c
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 23 20:00:40 2021 +0200

    Use `Sha1Git` type to explicitly state the kind of identifiers
    
    Previous occurrences of `bytes` and `Sha1` are not correctly using `Sha1Git`.
    Also, some bytes conversion methods were replaced by their counterparts in
    the swh.model.hashutil module.

commit fa22dc902781e30e46823030681f003983cc6d6e
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 23 19:12:06 2021 +0200

    Add support for sha1 identifiers for origins

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/206/ for more details.

vlorentz added inline comments.
swh/provenance/postgresql/provenancedb_with_path.py
66–68

more typo-proof IMO

swh/provenance/postgresql/provenancedb_without_path.py
26–28

ditto

(and this "row -> ProvenanceResult" code should probably be unified somewhere)

58–60

ditto

swh/provenance/postgresql/provenancedb_with_path.py
66–68

the original idea was to use psycopg2.extras.DictCursor actually. But after since the refactoring is quite big I postponed that change and end up forgetting about it. Then we can simply do ProvenanceResult(**row) by setting the right aliases in the SQL query (ie url -> origin)

swh/provenance/postgresql/provenancedb_without_path.py
26–28

Same as above, using psycopg2.extras.DictCursor will simplify this

58–60

agreed

Build is green

Patch application report for D5946 (id=21395)

Could not rebase; Attempt merge onto d892b29e40...

Updating d892b29..07a30e4
Fast-forward
 swh/provenance/archive.py                          |  24 +--
 swh/provenance/cli.py                              |  28 ++-
 swh/provenance/graph.py                            |   4 +-
 swh/provenance/model.py                            |  53 +++---
 swh/provenance/origin.py                           |  21 +--
 swh/provenance/postgresql/archive.py               | 115 ++++--------
 swh/provenance/postgresql/provenancedb_base.py     | 126 +++++++++----
 .../postgresql/provenancedb_with_path.py           |  63 ++++---
 .../postgresql/provenancedb_without_path.py        |  59 +++---
 swh/provenance/provenance.py                       | 207 ++++++++++++++-------
 swh/provenance/revision.py                         |  13 +-
 swh/provenance/sql/30-schema.sql                   |   4 +-
 swh/provenance/storage/archive.py                  |  30 +--
 swh/provenance/tests/conftest.py                   |  22 ++-
 .../tests/data/generate_storage_from_git.py        |   3 +-
 .../data/history_graphs_with-merges_visits-01.yaml |  55 ++++++
 swh/provenance/tests/data/with-merges.msgpack      | Bin 0 -> 7501 bytes
 ...repo_with_merges.yaml => with-merges_repo.yaml} |   0
 ...s-visits-01.yaml => with-merges_visits-01.yaml} |   0
 swh/provenance/tests/test_archive_interface.py     |  51 +++++
 swh/provenance/tests/test_conftest.py              |   2 +-
 swh/provenance/tests/test_history_graph.py         |  62 ++++++
 swh/provenance/tests/test_origin_iterator.py       |   8 +-
 swh/provenance/tests/test_provenance_db.py         |   4 +-
 swh/provenance/tests/test_provenance_heuristics.py |  30 +--
 25 files changed, 635 insertions(+), 349 deletions(-)
 create mode 100644 swh/provenance/tests/data/history_graphs_with-merges_visits-01.yaml
 create mode 100644 swh/provenance/tests/data/with-merges.msgpack
 rename swh/provenance/tests/data/{repo_with_merges.yaml => with-merges_repo.yaml} (100%)
 rename swh/provenance/tests/data/{repo_with_merges-visits-01.yaml => with-merges_visits-01.yaml} (100%)
 create mode 100644 swh/provenance/tests/test_archive_interface.py
 create mode 100644 swh/provenance/tests/test_history_graph.py
Changes applied before test
commit 07a30e43a76e170ab03764035da68dcf7db1fc3b
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 28 14:28:32 2021 +0200

    Rework `ProvenanceInterface` as discussed during backend design
    
    Add `ProvenanceResult` class to be returned by `content_find_first` and
    `content_find_all` methods. Rename some methods. Improve type annotations.

commit 2fd3f56b57f8db6691ae6b8b7cb7ac557b764172
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:31:16 2021 +0200

    Add tests for history graph topology

commit d45d6ff9e9317ecfe38d584df7297c548b654d28
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:10:38 2021 +0200

    Fix database queries related to the origin-revision layer
    
    This required allowing null dates in the `revision` table so that revision can be added
    by the origin-revision layer algorithm but not recognized as already processed by the
    revision-content layer. Revision and origin entries are now inserted in the database
    prior to inserting rows to revision_in_origin and revision_before_revision relations,
    so that internal ids are properly resolved.

commit 0e2a3c64ce3c368b53c101c541e8aebcde789477
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Fri Jun 25 13:38:26 2021 +0200

    Add test to compare both `ArchiveInterface` implementations
    
    Improve documentation of the interface and complete pending TODO's.

commit 98bba93cccece2b47ec4cd5887997cb5bede1e87
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:25:15 2021 +0200

    Rename test files to keep naming convension
    
    Also added missing .msgpack file dump for new with-merges repository.

commit fa9198afb71bcf3b8abea07d88d763a430f7358e
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:05:24 2021 +0200

    Refactor `ArchiveInterface` to fit origin-revision layer needs
    
    Replace `revision_get` method by `revision_get_parents` returning an iterable of
    parents' ids only, instead of a swh.model.model.Revision object.

commit 9e0c1aa099073887206c9334e17b49ee31bbef9a
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 23 20:00:40 2021 +0200

    Use `Sha1Git` type to explicitly state the kind of identifiers
    
    Previous occurrences of `bytes` and `Sha1` are now correctly using `Sha1Git`.
    Also, some bytes conversion methods were replaced by their counterparts in
    the swh.model.hashutil module.

commit a27ffff67b6b14bf37d153bb9b1d1c2ae63773fc
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 23 19:12:06 2021 +0200

    Add support for sha1 identifiers for origins

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/217/ for more details.

Build is green

Patch application report for D5946 (id=21396)

Could not rebase; Attempt merge onto d892b29e40...

Updating d892b29..3672235
Fast-forward
 swh/provenance/__init__.py                         |   2 +-
 swh/provenance/archive.py                          |  24 +-
 swh/provenance/backend.py                          | 211 +++++++++++++++
 swh/provenance/cli.py                              |  28 +-
 swh/provenance/graph.py                            |   4 +-
 swh/provenance/model.py                            |  53 ++--
 swh/provenance/origin.py                           |  21 +-
 swh/provenance/postgresql/archive.py               | 115 +++------
 swh/provenance/postgresql/provenancedb_base.py     | 136 +++++++---
 .../postgresql/provenancedb_with_path.py           |  75 +++---
 .../postgresql/provenancedb_without_path.py        |  71 +++---
 swh/provenance/provenance.py                       | 282 ++++++---------------
 swh/provenance/revision.py                         |  13 +-
 swh/provenance/sql/30-schema.sql                   |   4 +-
 swh/provenance/storage/archive.py                  |  30 ++-
 swh/provenance/tests/conftest.py                   |  24 +-
 .../tests/data/generate_storage_from_git.py        |   3 +-
 .../data/history_graphs_with-merges_visits-01.yaml |  55 ++++
 swh/provenance/tests/data/with-merges.msgpack      | Bin 0 -> 7501 bytes
 ...repo_with_merges.yaml => with-merges_repo.yaml} |   0
 ...s-visits-01.yaml => with-merges_visits-01.yaml} |   0
 swh/provenance/tests/test_archive_interface.py     |  51 ++++
 swh/provenance/tests/test_conftest.py              |   2 +-
 swh/provenance/tests/test_history_graph.py         |  62 +++++
 swh/provenance/tests/test_origin_iterator.py       |   8 +-
 swh/provenance/tests/test_provenance_db.py         |   4 +-
 swh/provenance/tests/test_provenance_heuristics.py |  56 ++--
 27 files changed, 802 insertions(+), 532 deletions(-)
 create mode 100644 swh/provenance/backend.py
 create mode 100644 swh/provenance/tests/data/history_graphs_with-merges_visits-01.yaml
 create mode 100644 swh/provenance/tests/data/with-merges.msgpack
 rename swh/provenance/tests/data/{repo_with_merges.yaml => with-merges_repo.yaml} (100%)
 rename swh/provenance/tests/data/{repo_with_merges-visits-01.yaml => with-merges_visits-01.yaml} (100%)
 create mode 100644 swh/provenance/tests/test_archive_interface.py
 create mode 100644 swh/provenance/tests/test_history_graph.py
Changes applied before test
commit 3672235c3258cf93fb37a82d060bf40ba1761b8b
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 28 14:37:50 2021 +0200

    Move `ProvenanceBackend` implementation to a separate file

commit 6f4da6fed7e663273627ad4a46c8489ef0a0e784
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jul 1 13:47:26 2021 +0200

    Use `RealDictCursor` in `ProvenanceDBBase`
    
    to improve the way `ProvenanceResult`s are generated.

commit 07a30e43a76e170ab03764035da68dcf7db1fc3b
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 28 14:28:32 2021 +0200

    Rework `ProvenanceInterface` as discussed during backend design
    
    Add `ProvenanceResult` class to be returned by `content_find_first` and
    `content_find_all` methods. Rename some methods. Improve type annotations.

commit 2fd3f56b57f8db6691ae6b8b7cb7ac557b764172
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:31:16 2021 +0200

    Add tests for history graph topology

commit d45d6ff9e9317ecfe38d584df7297c548b654d28
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:10:38 2021 +0200

    Fix database queries related to the origin-revision layer
    
    This required allowing null dates in the `revision` table so that revision can be added
    by the origin-revision layer algorithm but not recognized as already processed by the
    revision-content layer. Revision and origin entries are now inserted in the database
    prior to inserting rows to revision_in_origin and revision_before_revision relations,
    so that internal ids are properly resolved.

commit 0e2a3c64ce3c368b53c101c541e8aebcde789477
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Fri Jun 25 13:38:26 2021 +0200

    Add test to compare both `ArchiveInterface` implementations
    
    Improve documentation of the interface and complete pending TODO's.

commit 98bba93cccece2b47ec4cd5887997cb5bede1e87
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:25:15 2021 +0200

    Rename test files to keep naming convension
    
    Also added missing .msgpack file dump for new with-merges repository.

commit fa9198afb71bcf3b8abea07d88d763a430f7358e
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:05:24 2021 +0200

    Refactor `ArchiveInterface` to fit origin-revision layer needs
    
    Replace `revision_get` method by `revision_get_parents` returning an iterable of
    parents' ids only, instead of a swh.model.model.Revision object.

commit 9e0c1aa099073887206c9334e17b49ee31bbef9a
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 23 20:00:40 2021 +0200

    Use `Sha1Git` type to explicitly state the kind of identifiers
    
    Previous occurrences of `bytes` and `Sha1` are now correctly using `Sha1Git`.
    Also, some bytes conversion methods were replaced by their counterparts in
    the swh.model.hashutil module.

commit a27ffff67b6b14bf37d153bb9b1d1c2ae63773fc
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 23 19:12:06 2021 +0200

    Add support for sha1 identifiers for origins

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/218/ for more details.

douardda added a subscriber: douardda.

okay but as stated, I don't like too much the general usage of the RealDictCursor; sometimes it helps, but sometimes it does not. Ideally both should be available (depending on the query).

swh/provenance/postgresql/provenancedb_base.py
19

I'm not convinced using a RealDictCursor for all queries really helps here, but meh.

This revision is now accepted and ready to land.Jul 2 2021, 3:48 PM

Build is green

Patch application report for D5946 (id=21445)

Could not rebase; Attempt merge onto d892b29e40...

Updating d892b29..7c0a091
Fast-forward
 swh/provenance/__init__.py                         |   2 +-
 swh/provenance/archive.py                          |  24 +-
 swh/provenance/backend.py                          | 213 ++++++++++++++++
 swh/provenance/cli.py                              |  28 +-
 swh/provenance/graph.py                            |   4 +-
 swh/provenance/model.py                            |  53 ++--
 swh/provenance/origin.py                           |  21 +-
 swh/provenance/postgresql/archive.py               | 115 +++------
 swh/provenance/postgresql/provenancedb_base.py     | 136 +++++++---
 .../postgresql/provenancedb_with_path.py           |  75 +++---
 .../postgresql/provenancedb_without_path.py        |  71 +++---
 swh/provenance/provenance.py                       | 282 ++++++---------------
 swh/provenance/revision.py                         |  13 +-
 swh/provenance/sql/30-schema.sql                   |   4 +-
 swh/provenance/storage/archive.py                  |  30 ++-
 swh/provenance/tests/conftest.py                   |  24 +-
 .../tests/data/generate_storage_from_git.py        |   3 +-
 .../data/history_graphs_with-merges_visits-01.yaml |  55 ++++
 swh/provenance/tests/data/with-merges.msgpack      | Bin 0 -> 7501 bytes
 ...repo_with_merges.yaml => with-merges_repo.yaml} |   0
 ...s-visits-01.yaml => with-merges_visits-01.yaml} |   0
 swh/provenance/tests/test_archive_interface.py     |  51 ++++
 swh/provenance/tests/test_conftest.py              |   2 +-
 swh/provenance/tests/test_history_graph.py         |  62 +++++
 swh/provenance/tests/test_origin_iterator.py       |   8 +-
 swh/provenance/tests/test_provenance_db.py         |   4 +-
 swh/provenance/tests/test_provenance_heuristics.py |  56 ++--
 27 files changed, 804 insertions(+), 532 deletions(-)
 create mode 100644 swh/provenance/backend.py
 create mode 100644 swh/provenance/tests/data/history_graphs_with-merges_visits-01.yaml
 create mode 100644 swh/provenance/tests/data/with-merges.msgpack
 rename swh/provenance/tests/data/{repo_with_merges.yaml => with-merges_repo.yaml} (100%)
 rename swh/provenance/tests/data/{repo_with_merges-visits-01.yaml => with-merges_visits-01.yaml} (100%)
 create mode 100644 swh/provenance/tests/test_archive_interface.py
 create mode 100644 swh/provenance/tests/test_history_graph.py
Changes applied before test
commit 7c0a091ce5ffbf0a02dbe9d7fc84435ddd46cde2
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 28 14:37:50 2021 +0200

    Move `ProvenanceBackend` implementation to a separate file

commit 34898ad3cb18c24a7d7bef79dcfe470c3a1374ef
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jul 1 13:47:26 2021 +0200

    Use `RealDictCursor` in `ProvenanceDBBase`
    
    to improve the way `ProvenanceResult`s are generated.

commit 721354c436b5f5a861800b11e6151afa1aa634b6
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 28 14:28:32 2021 +0200

    Rework `ProvenanceInterface` as discussed during backend design
    
    Add `ProvenanceResult` class to be returned by `content_find_first` and
    `content_find_all` methods. Rename some methods. Improve type annotations.

commit 01f8d40ffccbcab6ecec6c2cf85478364e006caa
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:31:16 2021 +0200

    Add tests for history graph topology

commit b7fdcdec7ea96101d62a57d9aeed114c897df961
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:10:38 2021 +0200

    Fix database queries related to the origin-revision layer
    
    This required allowing null dates in the `revision` table so that revision can be added
    by the origin-revision layer algorithm but not recognized as already processed by the
    revision-content layer. Revision and origin entries are now inserted in the database
    prior to inserting rows to revision_in_origin and revision_before_revision relations,
    so that internal ids are properly resolved.

commit 0e2a3c64ce3c368b53c101c541e8aebcde789477
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Fri Jun 25 13:38:26 2021 +0200

    Add test to compare both `ArchiveInterface` implementations
    
    Improve documentation of the interface and complete pending TODO's.

commit 98bba93cccece2b47ec4cd5887997cb5bede1e87
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:25:15 2021 +0200

    Rename test files to keep naming convension
    
    Also added missing .msgpack file dump for new with-merges repository.

commit fa9198afb71bcf3b8abea07d88d763a430f7358e
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:05:24 2021 +0200

    Refactor `ArchiveInterface` to fit origin-revision layer needs
    
    Replace `revision_get` method by `revision_get_parents` returning an iterable of
    parents' ids only, instead of a swh.model.model.Revision object.

commit 9e0c1aa099073887206c9334e17b49ee31bbef9a
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 23 20:00:40 2021 +0200

    Use `Sha1Git` type to explicitly state the kind of identifiers
    
    Previous occurrences of `bytes` and `Sha1` are now correctly using `Sha1Git`.
    Also, some bytes conversion methods were replaced by their counterparts in
    the swh.model.hashutil module.

commit a27ffff67b6b14bf37d153bb9b1d1c2ae63773fc
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 23 19:12:06 2021 +0200

    Add support for sha1 identifiers for origins

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/243/ for more details.