Page MenuHomeSoftware Heritage

Add `ProvenanceStorageInterface` as discussed during backend design
ClosedPublic

Authored by aeviso on Jun 29 2021, 12:52 PM.

Details

Summary

Rework backend-related classes to properly use the new interface.
Adapt tests to the new structure as well.

Depends on D5946

Diff Detail

Repository
rDPROV Provenance database
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D5947 (id=21342)

Could not rebase; Attempt merge onto d892b29e40...

Updating d892b29..6afc8b3
Fast-forward
 swh/provenance/__init__.py                         |  47 ++-
 swh/provenance/archive.py                          |  24 +-
 swh/provenance/backend.py                          | 357 ++++++++++++++++++
 swh/provenance/graph.py                            |   4 +-
 swh/provenance/model.py                            |  53 ++-
 swh/provenance/origin.py                           |  21 +-
 swh/provenance/postgresql/archive.py               | 118 +++---
 swh/provenance/postgresql/provenancedb_base.py     | 411 +++++++++++++--------
 .../postgresql/provenancedb_with_path.py           | 103 ++----
 .../postgresql/provenancedb_without_path.py        |  82 ++--
 swh/provenance/provenance.py                       | 257 ++++---------
 swh/provenance/revision.py                         |  13 +-
 swh/provenance/sql/30-schema.sql                   |  30 +-
 swh/provenance/storage/archive.py                  |  30 +-
 swh/provenance/tests/conftest.py                   |  32 +-
 .../tests/data/generate_storage_from_git.py        |   3 +-
 .../data/history_graphs_with-merges_visits-01.yaml |  55 +++
 swh/provenance/tests/data/with-merges.msgpack      | Bin 0 -> 7501 bytes
 ...repo_with_merges.yaml => with-merges_repo.yaml} |   0
 ...s-visits-01.yaml => with-merges_visits-01.yaml} |   0
 swh/provenance/tests/test_archive_interface.py     |  51 +++
 swh/provenance/tests/test_conftest.py              |   2 +-
 swh/provenance/tests/test_history_graph.py         |  62 ++++
 swh/provenance/tests/test_origin_iterator.py       |   8 +-
 swh/provenance/tests/test_provenance_db.py         |   4 +-
 swh/provenance/tests/test_provenance_heuristics.py |  30 +-
 26 files changed, 1121 insertions(+), 676 deletions(-)
 create mode 100644 swh/provenance/backend.py
 create mode 100644 swh/provenance/tests/data/history_graphs_with-merges_visits-01.yaml
 create mode 100644 swh/provenance/tests/data/with-merges.msgpack
 rename swh/provenance/tests/data/{repo_with_merges.yaml => with-merges_repo.yaml} (100%)
 rename swh/provenance/tests/data/{repo_with_merges-visits-01.yaml => with-merges_visits-01.yaml} (100%)
 create mode 100644 swh/provenance/tests/test_archive_interface.py
 create mode 100644 swh/provenance/tests/test_history_graph.py
Changes applied before test
commit 6afc8b39601c0f93375bcf37daa1b8a3d5bf242a
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 28 19:21:58 2021 +0200

    Add ProvenanceStorageInterface
    
    Rework backend-related classes to properly use the new interface.
    Adapt tests to the new structure as well.

commit 23184e7de91d7e60577ce730868098b91a72b1d1
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 28 14:37:50 2021 +0200

    Move `ProvenanceBackend` implementation to a separate file

commit f32475952907452f3dbe3d51be9433aa854413bf
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 28 14:28:32 2021 +0200

    Rework `ProvenanceInterface` as discussed during backend design
    
    Add `ProvenanceResult` class to be returned by `content_find_first` and
    `content_find_all` methods. Rename some methods. Improve type annotations.

commit ad860db9bfeff7f276b3e356c9e21cb57cafc4c2
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:31:16 2021 +0200

    Add tests for history graph topology

commit 37ac81faf15a32c4471a3c4ee5140bcb9bf57178
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:10:38 2021 +0200

    Fix database queries related to the origin-revision layer
    
    This required allowing null dates in the `revision` table so that revision can be added
    by the origin-revision layer algorithm but not recognized as already processed by the
    revision-content layer. Revision and origin entries are now inserted in the database
    prior to inserting rows to revision_in_origin and revision_before_revision relations,
    so that internal ids are properly resolved.

commit 4eb166cc4f2aa036c932b9a5eb462454a70ee0d9
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Fri Jun 25 13:38:26 2021 +0200

    Add test to compare both ArchiveInterface implementations
    
    Improve documentation of the interface and complete pending TODO's.

commit 01ac9eea375258ac1e000389d3fd286d0dbae458
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:25:15 2021 +0200

    Rename test files to keep naming convension
    
    Also added missing .msgpack file dump for new with-merges repository.

commit 76d1560924251396c1ac63c286d8612ce0f7e9d9
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:05:24 2021 +0200

    Refactor ArchiveInterface to fit origin-revision layer needs
    
    Replace `revision_get` method by `revision_get_parents` returning an iterable of
    parents' ids only, instead of a swh.model.model.Revision object.

commit df69a9e57692ed9d4d870c295a21b3ac187d7b9c
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 23 20:00:40 2021 +0200

    Use `Sha1Git` type to explicitly state the kind of identifiers
    
    Previous occurrences of `bytes` and `Sha1` are not correctly using `Sha1Git`.
    Also, some bytes conversion methods were replaced by their counterparts in
    the swh.model.hashutil module.

commit fa22dc902781e30e46823030681f003983cc6d6e
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 23 19:12:06 2021 +0200

    Add support for sha1 identifiers for origins

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/196/ for more details.

aeviso retitled this revision from Add ProvenanceStorageInterface to Add `ProvenanceStorageInterface` as discussed during backend design.Jun 29 2021, 3:51 PM

Build is green

Patch application report for D5947 (id=21353)

Could not rebase; Attempt merge onto d892b29e40...

Updating d892b29..a3da061
Fast-forward
 swh/provenance/__init__.py                         |  47 ++-
 swh/provenance/archive.py                          |  24 +-
 swh/provenance/backend.py                          | 357 ++++++++++++++++++
 swh/provenance/graph.py                            |   4 +-
 swh/provenance/model.py                            |  53 ++-
 swh/provenance/origin.py                           |  21 +-
 swh/provenance/postgresql/archive.py               | 118 +++---
 swh/provenance/postgresql/provenancedb_base.py     | 411 +++++++++++++--------
 .../postgresql/provenancedb_with_path.py           | 103 ++----
 .../postgresql/provenancedb_without_path.py        |  82 ++--
 swh/provenance/provenance.py                       | 298 ++++++---------
 swh/provenance/revision.py                         |  13 +-
 swh/provenance/sql/30-schema.sql                   |  30 +-
 swh/provenance/storage/archive.py                  |  30 +-
 swh/provenance/tests/conftest.py                   |  32 +-
 .../tests/data/generate_storage_from_git.py        |   3 +-
 .../data/history_graphs_with-merges_visits-01.yaml |  55 +++
 swh/provenance/tests/data/with-merges.msgpack      | Bin 0 -> 7501 bytes
 ...repo_with_merges.yaml => with-merges_repo.yaml} |   0
 ...s-visits-01.yaml => with-merges_visits-01.yaml} |   0
 swh/provenance/tests/test_archive_interface.py     |  51 +++
 swh/provenance/tests/test_conftest.py              |   2 +-
 swh/provenance/tests/test_history_graph.py         |  62 ++++
 swh/provenance/tests/test_origin_iterator.py       |   8 +-
 swh/provenance/tests/test_provenance_db.py         |   4 +-
 swh/provenance/tests/test_provenance_heuristics.py |  30 +-
 26 files changed, 1162 insertions(+), 676 deletions(-)
 create mode 100644 swh/provenance/backend.py
 create mode 100644 swh/provenance/tests/data/history_graphs_with-merges_visits-01.yaml
 create mode 100644 swh/provenance/tests/data/with-merges.msgpack
 rename swh/provenance/tests/data/{repo_with_merges.yaml => with-merges_repo.yaml} (100%)
 rename swh/provenance/tests/data/{repo_with_merges-visits-01.yaml => with-merges_visits-01.yaml} (100%)
 create mode 100644 swh/provenance/tests/test_archive_interface.py
 create mode 100644 swh/provenance/tests/test_history_graph.py
Changes applied before test
commit a3da0612eae1ded260eeafee9dc77f2bbf84a47f
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 28 19:21:58 2021 +0200

    Add `ProvenanceStorageInterface` as discussed during backend design
    
    Rework backend-related classes to properly use the new interface.
    Adapt tests to the new structure as well.

commit 361199109d7d5a6cb694685cb2062940abe814bb
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 28 14:37:50 2021 +0200

    Move `ProvenanceBackend` implementation to a separate file

commit d058de2c080ee0c79ae57131d5c8ebdbeb6d0486
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 28 14:28:32 2021 +0200

    Rework `ProvenanceInterface` as discussed during backend design
    
    Add `ProvenanceResult` class to be returned by `content_find_first` and
    `content_find_all` methods. Rename some methods. Improve type annotations.

commit ad860db9bfeff7f276b3e356c9e21cb57cafc4c2
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:31:16 2021 +0200

    Add tests for history graph topology

commit 37ac81faf15a32c4471a3c4ee5140bcb9bf57178
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:10:38 2021 +0200

    Fix database queries related to the origin-revision layer
    
    This required allowing null dates in the `revision` table so that revision can be added
    by the origin-revision layer algorithm but not recognized as already processed by the
    revision-content layer. Revision and origin entries are now inserted in the database
    prior to inserting rows to revision_in_origin and revision_before_revision relations,
    so that internal ids are properly resolved.

commit 4eb166cc4f2aa036c932b9a5eb462454a70ee0d9
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Fri Jun 25 13:38:26 2021 +0200

    Add test to compare both ArchiveInterface implementations
    
    Improve documentation of the interface and complete pending TODO's.

commit 01ac9eea375258ac1e000389d3fd286d0dbae458
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:25:15 2021 +0200

    Rename test files to keep naming convension
    
    Also added missing .msgpack file dump for new with-merges repository.

commit 76d1560924251396c1ac63c286d8612ce0f7e9d9
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:05:24 2021 +0200

    Refactor ArchiveInterface to fit origin-revision layer needs
    
    Replace `revision_get` method by `revision_get_parents` returning an iterable of
    parents' ids only, instead of a swh.model.model.Revision object.

commit df69a9e57692ed9d4d870c295a21b3ac187d7b9c
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 23 20:00:40 2021 +0200

    Use `Sha1Git` type to explicitly state the kind of identifiers
    
    Previous occurrences of `bytes` and `Sha1` are not correctly using `Sha1Git`.
    Also, some bytes conversion methods were replaced by their counterparts in
    the swh.model.hashutil module.

commit fa22dc902781e30e46823030681f003983cc6d6e
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 23 19:12:06 2021 +0200

    Add support for sha1 identifiers for origins

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/199/ for more details.

Build is green

Patch application report for D5947 (id=21364)

Could not rebase; Attempt merge onto d892b29e40...

Updating d892b29..695b498
Fast-forward
 swh/provenance/__init__.py                         |  47 ++-
 swh/provenance/archive.py                          |  24 +-
 swh/provenance/backend.py                          | 357 ++++++++++++++++++
 swh/provenance/graph.py                            |   4 +-
 swh/provenance/model.py                            |  53 ++-
 swh/provenance/origin.py                           |  21 +-
 swh/provenance/postgresql/archive.py               | 118 +++---
 swh/provenance/postgresql/provenancedb_base.py     | 398 ++++++++++++---------
 .../postgresql/provenancedb_with_path.py           | 115 +++---
 .../postgresql/provenancedb_without_path.py        |  94 ++---
 swh/provenance/provenance.py                       | 341 ++++++++----------
 swh/provenance/revision.py                         |  13 +-
 swh/provenance/sql/30-schema.sql                   |  30 +-
 swh/provenance/storage/archive.py                  |  30 +-
 swh/provenance/tests/conftest.py                   |  32 +-
 .../tests/data/generate_storage_from_git.py        |   3 +-
 .../data/history_graphs_with-merges_visits-01.yaml |  55 +++
 swh/provenance/tests/data/with-merges.msgpack      | Bin 0 -> 7501 bytes
 ...repo_with_merges.yaml => with-merges_repo.yaml} |   0
 ...s-visits-01.yaml => with-merges_visits-01.yaml} |   0
 swh/provenance/tests/test_archive_interface.py     |  51 +++
 swh/provenance/tests/test_conftest.py              |   2 +-
 swh/provenance/tests/test_history_graph.py         |  62 ++++
 swh/provenance/tests/test_origin_iterator.py       |   8 +-
 swh/provenance/tests/test_provenance_db.py         |   4 +-
 swh/provenance/tests/test_provenance_heuristics.py |  30 +-
 26 files changed, 1194 insertions(+), 698 deletions(-)
 create mode 100644 swh/provenance/backend.py
 create mode 100644 swh/provenance/tests/data/history_graphs_with-merges_visits-01.yaml
 create mode 100644 swh/provenance/tests/data/with-merges.msgpack
 rename swh/provenance/tests/data/{repo_with_merges.yaml => with-merges_repo.yaml} (100%)
 rename swh/provenance/tests/data/{repo_with_merges-visits-01.yaml => with-merges_visits-01.yaml} (100%)
 create mode 100644 swh/provenance/tests/test_archive_interface.py
 create mode 100644 swh/provenance/tests/test_history_graph.py
Changes applied before test
commit 695b498600682045004fdd04859ecf9e96819479
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 28 19:21:58 2021 +0200

    Add `ProvenanceStorageInterface` as discussed during backend design
    
    Rework backend-related classes to properly use the new interface.
    Adapt tests to the new structure as well.

commit 361199109d7d5a6cb694685cb2062940abe814bb
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 28 14:37:50 2021 +0200

    Move `ProvenanceBackend` implementation to a separate file

commit d058de2c080ee0c79ae57131d5c8ebdbeb6d0486
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 28 14:28:32 2021 +0200

    Rework `ProvenanceInterface` as discussed during backend design
    
    Add `ProvenanceResult` class to be returned by `content_find_first` and
    `content_find_all` methods. Rename some methods. Improve type annotations.

commit ad860db9bfeff7f276b3e356c9e21cb57cafc4c2
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:31:16 2021 +0200

    Add tests for history graph topology

commit 37ac81faf15a32c4471a3c4ee5140bcb9bf57178
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:10:38 2021 +0200

    Fix database queries related to the origin-revision layer
    
    This required allowing null dates in the `revision` table so that revision can be added
    by the origin-revision layer algorithm but not recognized as already processed by the
    revision-content layer. Revision and origin entries are now inserted in the database
    prior to inserting rows to revision_in_origin and revision_before_revision relations,
    so that internal ids are properly resolved.

commit 4eb166cc4f2aa036c932b9a5eb462454a70ee0d9
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Fri Jun 25 13:38:26 2021 +0200

    Add test to compare both ArchiveInterface implementations
    
    Improve documentation of the interface and complete pending TODO's.

commit 01ac9eea375258ac1e000389d3fd286d0dbae458
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:25:15 2021 +0200

    Rename test files to keep naming convension
    
    Also added missing .msgpack file dump for new with-merges repository.

commit 76d1560924251396c1ac63c286d8612ce0f7e9d9
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:05:24 2021 +0200

    Refactor ArchiveInterface to fit origin-revision layer needs
    
    Replace `revision_get` method by `revision_get_parents` returning an iterable of
    parents' ids only, instead of a swh.model.model.Revision object.

commit df69a9e57692ed9d4d870c295a21b3ac187d7b9c
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 23 20:00:40 2021 +0200

    Use `Sha1Git` type to explicitly state the kind of identifiers
    
    Previous occurrences of `bytes` and `Sha1` are not correctly using `Sha1Git`.
    Also, some bytes conversion methods were replaced by their counterparts in
    the swh.model.hashutil module.

commit fa22dc902781e30e46823030681f003983cc6d6e
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 23 19:12:06 2021 +0200

    Add support for sha1 identifiers for origins

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/201/ for more details.

Build is green

Patch application report for D5947 (id=21370)

Could not rebase; Attempt merge onto d892b29e40...

Updating d892b29..2304647
Fast-forward
 swh/provenance/__init__.py                         |  47 ++-
 swh/provenance/archive.py                          |  24 +-
 swh/provenance/backend.py                          | 357 ++++++++++++++++++
 swh/provenance/graph.py                            |   4 +-
 swh/provenance/model.py                            |  53 ++-
 swh/provenance/origin.py                           |  21 +-
 swh/provenance/postgresql/archive.py               | 118 +++---
 swh/provenance/postgresql/provenancedb_base.py     | 398 ++++++++++++---------
 .../postgresql/provenancedb_with_path.py           | 115 +++---
 .../postgresql/provenancedb_without_path.py        |  94 ++---
 swh/provenance/provenance.py                       | 341 ++++++++----------
 swh/provenance/revision.py                         |  13 +-
 swh/provenance/sql/30-schema.sql                   |  30 +-
 swh/provenance/storage/archive.py                  |  30 +-
 swh/provenance/tests/conftest.py                   |  32 +-
 .../tests/data/generate_storage_from_git.py        |   3 +-
 .../data/history_graphs_with-merges_visits-01.yaml |  55 +++
 swh/provenance/tests/data/with-merges.msgpack      | Bin 0 -> 7501 bytes
 ...repo_with_merges.yaml => with-merges_repo.yaml} |   0
 ...s-visits-01.yaml => with-merges_visits-01.yaml} |   0
 swh/provenance/tests/test_archive_interface.py     |  51 +++
 swh/provenance/tests/test_conftest.py              |   2 +-
 swh/provenance/tests/test_history_graph.py         |  62 ++++
 swh/provenance/tests/test_origin_iterator.py       |   8 +-
 swh/provenance/tests/test_provenance_db.py         |   4 +-
 swh/provenance/tests/test_provenance_heuristics.py |  30 +-
 26 files changed, 1194 insertions(+), 698 deletions(-)
 create mode 100644 swh/provenance/backend.py
 create mode 100644 swh/provenance/tests/data/history_graphs_with-merges_visits-01.yaml
 create mode 100644 swh/provenance/tests/data/with-merges.msgpack
 rename swh/provenance/tests/data/{repo_with_merges.yaml => with-merges_repo.yaml} (100%)
 rename swh/provenance/tests/data/{repo_with_merges-visits-01.yaml => with-merges_visits-01.yaml} (100%)
 create mode 100644 swh/provenance/tests/test_archive_interface.py
 create mode 100644 swh/provenance/tests/test_history_graph.py
Changes applied before test
commit 2304647745e79308b72978ba3a9141f3e6f844f8
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 28 19:21:58 2021 +0200

    Add `ProvenanceStorageInterface` as discussed during backend design
    
    Rework backend-related classes to properly use the new interface.
    Adapt tests to the new structure as well.

commit e25122d2e47de942a772164e9f1a60f425c87d97
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 28 14:37:50 2021 +0200

    Move `ProvenanceBackend` implementation to a separate file

commit b7678a341da72587cc48848f5a72f65861f892af
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 28 14:28:32 2021 +0200

    Rework `ProvenanceInterface` as discussed during backend design
    
    Add `ProvenanceResult` class to be returned by `content_find_first` and
    `content_find_all` methods. Rename some methods. Improve type annotations.

commit 7a59ff712bb8b5ae22e6f016475d03317c27b64a
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:31:16 2021 +0200

    Add tests for history graph topology

commit 3171ae2f129df433689fd22e32c8eeebf7af4171
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:10:38 2021 +0200

    Fix database queries related to the origin-revision layer
    
    This required allowing null dates in the `revision` table so that revision can be added
    by the origin-revision layer algorithm but not recognized as already processed by the
    revision-content layer. Revision and origin entries are now inserted in the database
    prior to inserting rows to revision_in_origin and revision_before_revision relations,
    so that internal ids are properly resolved.

commit 6736f6068280f167df5616681dee9ad67b2b7dbd
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Fri Jun 25 13:38:26 2021 +0200

    Add test to compare both `ArchiveInterface` implementations
    
    Improve documentation of the interface and complete pending TODO's.

commit dde867254e51dd87f4aba3cdea59da8bffc2d160
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:25:15 2021 +0200

    Rename test files to keep naming convension
    
    Also added missing .msgpack file dump for new with-merges repository.

commit 14001c1844598a3d4ebd1b5f609070f9c85dcaa9
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:05:24 2021 +0200

    Refactor `ArchiveInterface` to fit origin-revision layer needs
    
    Replace `revision_get` method by `revision_get_parents` returning an iterable of
    parents' ids only, instead of a swh.model.model.Revision object.

commit df69a9e57692ed9d4d870c295a21b3ac187d7b9c
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 23 20:00:40 2021 +0200

    Use `Sha1Git` type to explicitly state the kind of identifiers
    
    Previous occurrences of `bytes` and `Sha1` are not correctly using `Sha1Git`.
    Also, some bytes conversion methods were replaced by their counterparts in
    the swh.model.hashutil module.

commit fa22dc902781e30e46823030681f003983cc6d6e
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 23 19:12:06 2021 +0200

    Add support for sha1 identifiers for origins

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/207/ for more details.

Build is green

Patch application report for D5947 (id=21385)

Could not rebase; Attempt merge onto d892b29e40...

Updating d892b29..35dafe5
Fast-forward
 swh/provenance/__init__.py                         |  47 ++-
 swh/provenance/archive.py                          |  24 +-
 swh/provenance/backend.py                          | 357 ++++++++++++++++++
 swh/provenance/cli.py                              |  28 +-
 swh/provenance/graph.py                            |   4 +-
 swh/provenance/model.py                            |  53 ++-
 swh/provenance/origin.py                           |  21 +-
 swh/provenance/postgresql/archive.py               | 118 +++---
 swh/provenance/postgresql/provenancedb_base.py     | 402 ++++++++++++---------
 .../postgresql/provenancedb_with_path.py           | 117 +++---
 .../postgresql/provenancedb_without_path.py        |  96 ++---
 swh/provenance/provenance.py                       | 349 +++++++++---------
 swh/provenance/revision.py                         |  13 +-
 swh/provenance/sql/30-schema.sql                   |  30 +-
 swh/provenance/storage/archive.py                  |  30 +-
 swh/provenance/tests/conftest.py                   |  32 +-
 .../tests/data/generate_storage_from_git.py        |   3 +-
 .../data/history_graphs_with-merges_visits-01.yaml |  55 +++
 swh/provenance/tests/data/with-merges.msgpack      | Bin 0 -> 7501 bytes
 ...repo_with_merges.yaml => with-merges_repo.yaml} |   0
 ...s-visits-01.yaml => with-merges_visits-01.yaml} |   0
 swh/provenance/tests/test_archive_interface.py     |  51 +++
 swh/provenance/tests/test_conftest.py              |   2 +-
 swh/provenance/tests/test_history_graph.py         |  62 ++++
 swh/provenance/tests/test_origin_iterator.py       |   8 +-
 swh/provenance/tests/test_provenance_db.py         |   4 +-
 swh/provenance/tests/test_provenance_heuristics.py |  56 +--
 27 files changed, 1224 insertions(+), 738 deletions(-)
 create mode 100644 swh/provenance/backend.py
 create mode 100644 swh/provenance/tests/data/history_graphs_with-merges_visits-01.yaml
 create mode 100644 swh/provenance/tests/data/with-merges.msgpack
 rename swh/provenance/tests/data/{repo_with_merges.yaml => with-merges_repo.yaml} (100%)
 rename swh/provenance/tests/data/{repo_with_merges-visits-01.yaml => with-merges_visits-01.yaml} (100%)
 create mode 100644 swh/provenance/tests/test_archive_interface.py
 create mode 100644 swh/provenance/tests/test_history_graph.py
Changes applied before test
commit 35dafe5f8b1b95a0610199c93864ad16a1659283
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jul 1 12:07:41 2021 +0200

    Use `RealDictCursor` in `ProvenanceDBBase`
    
    to improve the way `ProvenanceResult`s are generated.
    
    Change `ProvenanceDBBase` from a `TypedDict` to a regular class.

commit 2304647745e79308b72978ba3a9141f3e6f844f8
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 28 19:21:58 2021 +0200

    Add `ProvenanceStorageInterface` as discussed during backend design
    
    Rework backend-related classes to properly use the new interface.
    Adapt tests to the new structure as well.

commit e25122d2e47de942a772164e9f1a60f425c87d97
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 28 14:37:50 2021 +0200

    Move `ProvenanceBackend` implementation to a separate file

commit b7678a341da72587cc48848f5a72f65861f892af
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 28 14:28:32 2021 +0200

    Rework `ProvenanceInterface` as discussed during backend design
    
    Add `ProvenanceResult` class to be returned by `content_find_first` and
    `content_find_all` methods. Rename some methods. Improve type annotations.

commit 7a59ff712bb8b5ae22e6f016475d03317c27b64a
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:31:16 2021 +0200

    Add tests for history graph topology

commit 3171ae2f129df433689fd22e32c8eeebf7af4171
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:10:38 2021 +0200

    Fix database queries related to the origin-revision layer
    
    This required allowing null dates in the `revision` table so that revision can be added
    by the origin-revision layer algorithm but not recognized as already processed by the
    revision-content layer. Revision and origin entries are now inserted in the database
    prior to inserting rows to revision_in_origin and revision_before_revision relations,
    so that internal ids are properly resolved.

commit 6736f6068280f167df5616681dee9ad67b2b7dbd
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Fri Jun 25 13:38:26 2021 +0200

    Add test to compare both `ArchiveInterface` implementations
    
    Improve documentation of the interface and complete pending TODO's.

commit dde867254e51dd87f4aba3cdea59da8bffc2d160
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:25:15 2021 +0200

    Rename test files to keep naming convension
    
    Also added missing .msgpack file dump for new with-merges repository.

commit 14001c1844598a3d4ebd1b5f609070f9c85dcaa9
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:05:24 2021 +0200

    Refactor `ArchiveInterface` to fit origin-revision layer needs
    
    Replace `revision_get` method by `revision_get_parents` returning an iterable of
    parents' ids only, instead of a swh.model.model.Revision object.

commit df69a9e57692ed9d4d870c295a21b3ac187d7b9c
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 23 20:00:40 2021 +0200

    Use `Sha1Git` type to explicitly state the kind of identifiers
    
    Previous occurrences of `bytes` and `Sha1` are not correctly using `Sha1Git`.
    Also, some bytes conversion methods were replaced by their counterparts in
    the swh.model.hashutil module.

commit fa22dc902781e30e46823030681f003983cc6d6e
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 23 19:12:06 2021 +0200

    Add support for sha1 identifiers for origins

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/209/ for more details.

Build is green

Patch application report for D5947 (id=21397)

Could not rebase; Attempt merge onto d892b29e40...

Updating d892b29..afb67f6
Fast-forward
 swh/provenance/__init__.py                         |  47 ++-
 swh/provenance/archive.py                          |  24 +-
 swh/provenance/backend.py                          | 357 ++++++++++++++++++
 swh/provenance/cli.py                              |  28 +-
 swh/provenance/graph.py                            |   4 +-
 swh/provenance/model.py                            |  53 ++-
 swh/provenance/origin.py                           |  21 +-
 swh/provenance/postgresql/archive.py               | 115 ++----
 swh/provenance/postgresql/provenancedb_base.py     | 402 ++++++++++++---------
 .../postgresql/provenancedb_with_path.py           | 117 +++---
 .../postgresql/provenancedb_without_path.py        |  96 ++---
 swh/provenance/provenance.py                       | 349 +++++++++---------
 swh/provenance/revision.py                         |  13 +-
 swh/provenance/sql/30-schema.sql                   |  30 +-
 swh/provenance/storage/archive.py                  |  30 +-
 swh/provenance/tests/conftest.py                   |  32 +-
 .../tests/data/generate_storage_from_git.py        |   3 +-
 .../data/history_graphs_with-merges_visits-01.yaml |  55 +++
 swh/provenance/tests/data/with-merges.msgpack      | Bin 0 -> 7501 bytes
 ...repo_with_merges.yaml => with-merges_repo.yaml} |   0
 ...s-visits-01.yaml => with-merges_visits-01.yaml} |   0
 swh/provenance/tests/test_archive_interface.py     |  51 +++
 swh/provenance/tests/test_conftest.py              |   2 +-
 swh/provenance/tests/test_history_graph.py         |  62 ++++
 swh/provenance/tests/test_origin_iterator.py       |   8 +-
 swh/provenance/tests/test_provenance_db.py         |   4 +-
 swh/provenance/tests/test_provenance_heuristics.py |  56 +--
 27 files changed, 1219 insertions(+), 740 deletions(-)
 create mode 100644 swh/provenance/backend.py
 create mode 100644 swh/provenance/tests/data/history_graphs_with-merges_visits-01.yaml
 create mode 100644 swh/provenance/tests/data/with-merges.msgpack
 rename swh/provenance/tests/data/{repo_with_merges.yaml => with-merges_repo.yaml} (100%)
 rename swh/provenance/tests/data/{repo_with_merges-visits-01.yaml => with-merges_visits-01.yaml} (100%)
 create mode 100644 swh/provenance/tests/test_archive_interface.py
 create mode 100644 swh/provenance/tests/test_history_graph.py
Changes applied before test
commit afb67f665ab00c03c0ca33e96b1bfc109c827c58
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 28 19:21:58 2021 +0200

    Add `ProvenanceStorageInterface` as discussed during backend design
    
    Rework backend-related classes to properly use the new interface.
    Adapt tests to the new structure as well.

commit 3672235c3258cf93fb37a82d060bf40ba1761b8b
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 28 14:37:50 2021 +0200

    Move `ProvenanceBackend` implementation to a separate file

commit 6f4da6fed7e663273627ad4a46c8489ef0a0e784
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jul 1 13:47:26 2021 +0200

    Use `RealDictCursor` in `ProvenanceDBBase`
    
    to improve the way `ProvenanceResult`s are generated.

commit 07a30e43a76e170ab03764035da68dcf7db1fc3b
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 28 14:28:32 2021 +0200

    Rework `ProvenanceInterface` as discussed during backend design
    
    Add `ProvenanceResult` class to be returned by `content_find_first` and
    `content_find_all` methods. Rename some methods. Improve type annotations.

commit 2fd3f56b57f8db6691ae6b8b7cb7ac557b764172
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:31:16 2021 +0200

    Add tests for history graph topology

commit d45d6ff9e9317ecfe38d584df7297c548b654d28
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:10:38 2021 +0200

    Fix database queries related to the origin-revision layer
    
    This required allowing null dates in the `revision` table so that revision can be added
    by the origin-revision layer algorithm but not recognized as already processed by the
    revision-content layer. Revision and origin entries are now inserted in the database
    prior to inserting rows to revision_in_origin and revision_before_revision relations,
    so that internal ids are properly resolved.

commit 0e2a3c64ce3c368b53c101c541e8aebcde789477
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Fri Jun 25 13:38:26 2021 +0200

    Add test to compare both `ArchiveInterface` implementations
    
    Improve documentation of the interface and complete pending TODO's.

commit 98bba93cccece2b47ec4cd5887997cb5bede1e87
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:25:15 2021 +0200

    Rename test files to keep naming convension
    
    Also added missing .msgpack file dump for new with-merges repository.

commit fa9198afb71bcf3b8abea07d88d763a430f7358e
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:05:24 2021 +0200

    Refactor `ArchiveInterface` to fit origin-revision layer needs
    
    Replace `revision_get` method by `revision_get_parents` returning an iterable of
    parents' ids only, instead of a swh.model.model.Revision object.

commit 9e0c1aa099073887206c9334e17b49ee31bbef9a
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 23 20:00:40 2021 +0200

    Use `Sha1Git` type to explicitly state the kind of identifiers
    
    Previous occurrences of `bytes` and `Sha1` are now correctly using `Sha1Git`.
    Also, some bytes conversion methods were replaced by their counterparts in
    the swh.model.hashutil module.

commit a27ffff67b6b14bf37d153bb9b1d1c2ae63773fc
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 23 19:12:06 2021 +0200

    Add support for sha1 identifiers for origins

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/219/ for more details.

Build is green

Patch application report for D5947 (id=21426)

Could not rebase; Attempt merge onto d892b29e40...

Updating d892b29..f819e43
Fast-forward
 swh/provenance/__init__.py                         |  47 ++-
 swh/provenance/archive.py                          |  24 +-
 swh/provenance/backend.py                          | 322 +++++++++++++++++
 swh/provenance/cli.py                              |  28 +-
 swh/provenance/graph.py                            |   4 +-
 swh/provenance/model.py                            |  53 ++-
 swh/provenance/origin.py                           |  21 +-
 swh/provenance/postgresql/archive.py               | 115 ++----
 swh/provenance/postgresql/provenancedb_base.py     | 402 ++++++++++++---------
 .../postgresql/provenancedb_with_path.py           | 117 +++---
 .../postgresql/provenancedb_without_path.py        |  96 ++---
 swh/provenance/provenance.py                       | 349 +++++++++---------
 swh/provenance/revision.py                         |  13 +-
 swh/provenance/sql/30-schema.sql                   |  30 +-
 swh/provenance/storage/archive.py                  |  30 +-
 swh/provenance/tests/conftest.py                   |  32 +-
 .../tests/data/generate_storage_from_git.py        |   3 +-
 .../data/history_graphs_with-merges_visits-01.yaml |  55 +++
 swh/provenance/tests/data/with-merges.msgpack      | Bin 0 -> 7501 bytes
 ...repo_with_merges.yaml => with-merges_repo.yaml} |   0
 ...s-visits-01.yaml => with-merges_visits-01.yaml} |   0
 swh/provenance/tests/test_archive_interface.py     |  51 +++
 swh/provenance/tests/test_conftest.py              |   2 +-
 swh/provenance/tests/test_history_graph.py         |  62 ++++
 swh/provenance/tests/test_origin_iterator.py       |   8 +-
 swh/provenance/tests/test_provenance_db.py         |   4 +-
 swh/provenance/tests/test_provenance_heuristics.py |  56 +--
 27 files changed, 1184 insertions(+), 740 deletions(-)
 create mode 100644 swh/provenance/backend.py
 create mode 100644 swh/provenance/tests/data/history_graphs_with-merges_visits-01.yaml
 create mode 100644 swh/provenance/tests/data/with-merges.msgpack
 rename swh/provenance/tests/data/{repo_with_merges.yaml => with-merges_repo.yaml} (100%)
 rename swh/provenance/tests/data/{repo_with_merges-visits-01.yaml => with-merges_visits-01.yaml} (100%)
 create mode 100644 swh/provenance/tests/test_archive_interface.py
 create mode 100644 swh/provenance/tests/test_history_graph.py
Changes applied before test
commit f819e4332df40b1ef35ff737f2558de570379473
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 28 19:21:58 2021 +0200

    Add `ProvenanceStorageInterface` as discussed during backend design
    
    Rework backend-related classes to properly use the new interface.
    Adapt tests to the new structure as well.

commit 3672235c3258cf93fb37a82d060bf40ba1761b8b
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 28 14:37:50 2021 +0200

    Move `ProvenanceBackend` implementation to a separate file

commit 6f4da6fed7e663273627ad4a46c8489ef0a0e784
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jul 1 13:47:26 2021 +0200

    Use `RealDictCursor` in `ProvenanceDBBase`
    
    to improve the way `ProvenanceResult`s are generated.

commit 07a30e43a76e170ab03764035da68dcf7db1fc3b
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 28 14:28:32 2021 +0200

    Rework `ProvenanceInterface` as discussed during backend design
    
    Add `ProvenanceResult` class to be returned by `content_find_first` and
    `content_find_all` methods. Rename some methods. Improve type annotations.

commit 2fd3f56b57f8db6691ae6b8b7cb7ac557b764172
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:31:16 2021 +0200

    Add tests for history graph topology

commit d45d6ff9e9317ecfe38d584df7297c548b654d28
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:10:38 2021 +0200

    Fix database queries related to the origin-revision layer
    
    This required allowing null dates in the `revision` table so that revision can be added
    by the origin-revision layer algorithm but not recognized as already processed by the
    revision-content layer. Revision and origin entries are now inserted in the database
    prior to inserting rows to revision_in_origin and revision_before_revision relations,
    so that internal ids are properly resolved.

commit 0e2a3c64ce3c368b53c101c541e8aebcde789477
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Fri Jun 25 13:38:26 2021 +0200

    Add test to compare both `ArchiveInterface` implementations
    
    Improve documentation of the interface and complete pending TODO's.

commit 98bba93cccece2b47ec4cd5887997cb5bede1e87
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:25:15 2021 +0200

    Rename test files to keep naming convension
    
    Also added missing .msgpack file dump for new with-merges repository.

commit fa9198afb71bcf3b8abea07d88d763a430f7358e
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:05:24 2021 +0200

    Refactor `ArchiveInterface` to fit origin-revision layer needs
    
    Replace `revision_get` method by `revision_get_parents` returning an iterable of
    parents' ids only, instead of a swh.model.model.Revision object.

commit 9e0c1aa099073887206c9334e17b49ee31bbef9a
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 23 20:00:40 2021 +0200

    Use `Sha1Git` type to explicitly state the kind of identifiers
    
    Previous occurrences of `bytes` and `Sha1` are now correctly using `Sha1Git`.
    Also, some bytes conversion methods were replaced by their counterparts in
    the swh.model.hashutil module.

commit a27ffff67b6b14bf37d153bb9b1d1c2ae63773fc
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 23 19:12:06 2021 +0200

    Add support for sha1 identifiers for origins

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/230/ for more details.

douardda added inline comments.
swh/provenance/__init__.py
29

it would be best to have the removal of this bwcompat support (and deprecation warning) in a dedicated revision, but meh.

swh/provenance/backend.py
72–95

This could probably be factorized. Also are you sure you want to write the whole content of the cache in the logs? Might generate a lots of logs...

98–129

Same here, this could be factorized (looping on the 3 entity types).

201–210

I'm not sure I get this chunk of code. What does it do exactly? What's the empty list for? Is it the start value of the sum() function? (If so it's much better to use named argument, much easier to understand).

But if I get this right, a more pythonic way (?) could be:

cache = self.cache["revision_before_revision"]
rbr_data = itertools.chain(*(( (prev, next, None) 
                     for next in cache[prev] )
                          for prev in cache ))
307–309

The explicit bool cast is a bit weird here. Since the methods returns a set(), I would find it more readable to compare the results with an empty set or use the length().

swh/provenance/postgresql/provenancedb_base.py
24–26

For a oneliner like this, using an intermediate local variable seems a bit overkill to me.

33

same as above

81

the "old ways" of doing this looks more readable to me:

", ".join(["%s"] * len(sha1s))

I've made several small comments / nitpicks, fell free to address them or not.

This revision is now accepted and ready to land.Jul 2 2021, 4:32 PM

Build is green

Patch application report for D5947 (id=21446)

Could not rebase; Attempt merge onto d892b29e40...

Updating d892b29..7998391
Fast-forward
 swh/provenance/__init__.py                         |  47 ++-
 swh/provenance/archive.py                          |  24 +-
 swh/provenance/backend.py                          | 324 +++++++++++++++++
 swh/provenance/cli.py                              |  28 +-
 swh/provenance/graph.py                            |   4 +-
 swh/provenance/model.py                            |  53 ++-
 swh/provenance/origin.py                           |  21 +-
 swh/provenance/postgresql/archive.py               | 115 ++----
 swh/provenance/postgresql/provenancedb_base.py     | 402 ++++++++++++---------
 .../postgresql/provenancedb_with_path.py           | 117 +++---
 .../postgresql/provenancedb_without_path.py        |  96 ++---
 swh/provenance/provenance.py                       | 349 +++++++++---------
 swh/provenance/revision.py                         |  13 +-
 swh/provenance/sql/30-schema.sql                   |  30 +-
 swh/provenance/storage/archive.py                  |  30 +-
 swh/provenance/tests/conftest.py                   |  32 +-
 .../tests/data/generate_storage_from_git.py        |   3 +-
 .../data/history_graphs_with-merges_visits-01.yaml |  55 +++
 swh/provenance/tests/data/with-merges.msgpack      | Bin 0 -> 7501 bytes
 ...repo_with_merges.yaml => with-merges_repo.yaml} |   0
 ...s-visits-01.yaml => with-merges_visits-01.yaml} |   0
 swh/provenance/tests/test_archive_interface.py     |  51 +++
 swh/provenance/tests/test_conftest.py              |   2 +-
 swh/provenance/tests/test_history_graph.py         |  62 ++++
 swh/provenance/tests/test_origin_iterator.py       |   8 +-
 swh/provenance/tests/test_provenance_db.py         |   4 +-
 swh/provenance/tests/test_provenance_heuristics.py |  56 +--
 27 files changed, 1186 insertions(+), 740 deletions(-)
 create mode 100644 swh/provenance/backend.py
 create mode 100644 swh/provenance/tests/data/history_graphs_with-merges_visits-01.yaml
 create mode 100644 swh/provenance/tests/data/with-merges.msgpack
 rename swh/provenance/tests/data/{repo_with_merges.yaml => with-merges_repo.yaml} (100%)
 rename swh/provenance/tests/data/{repo_with_merges-visits-01.yaml => with-merges_visits-01.yaml} (100%)
 create mode 100644 swh/provenance/tests/test_archive_interface.py
 create mode 100644 swh/provenance/tests/test_history_graph.py
Changes applied before test
commit 799839120cb99f22ce4272468ae0e388c335fb06
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 28 19:21:58 2021 +0200

    Add `ProvenanceStorageInterface` as discussed during backend design
    
    Rework backend-related classes to properly use the new interface.
    Adapt tests to the new structure as well.

commit 7c0a091ce5ffbf0a02dbe9d7fc84435ddd46cde2
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 28 14:37:50 2021 +0200

    Move `ProvenanceBackend` implementation to a separate file

commit 34898ad3cb18c24a7d7bef79dcfe470c3a1374ef
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jul 1 13:47:26 2021 +0200

    Use `RealDictCursor` in `ProvenanceDBBase`
    
    to improve the way `ProvenanceResult`s are generated.

commit 721354c436b5f5a861800b11e6151afa1aa634b6
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 28 14:28:32 2021 +0200

    Rework `ProvenanceInterface` as discussed during backend design
    
    Add `ProvenanceResult` class to be returned by `content_find_first` and
    `content_find_all` methods. Rename some methods. Improve type annotations.

commit 01f8d40ffccbcab6ecec6c2cf85478364e006caa
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:31:16 2021 +0200

    Add tests for history graph topology

commit b7fdcdec7ea96101d62a57d9aeed114c897df961
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:10:38 2021 +0200

    Fix database queries related to the origin-revision layer
    
    This required allowing null dates in the `revision` table so that revision can be added
    by the origin-revision layer algorithm but not recognized as already processed by the
    revision-content layer. Revision and origin entries are now inserted in the database
    prior to inserting rows to revision_in_origin and revision_before_revision relations,
    so that internal ids are properly resolved.

commit 0e2a3c64ce3c368b53c101c541e8aebcde789477
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Fri Jun 25 13:38:26 2021 +0200

    Add test to compare both `ArchiveInterface` implementations
    
    Improve documentation of the interface and complete pending TODO's.

commit 98bba93cccece2b47ec4cd5887997cb5bede1e87
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:25:15 2021 +0200

    Rename test files to keep naming convension
    
    Also added missing .msgpack file dump for new with-merges repository.

commit fa9198afb71bcf3b8abea07d88d763a430f7358e
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 24 16:05:24 2021 +0200

    Refactor `ArchiveInterface` to fit origin-revision layer needs
    
    Replace `revision_get` method by `revision_get_parents` returning an iterable of
    parents' ids only, instead of a swh.model.model.Revision object.

commit 9e0c1aa099073887206c9334e17b49ee31bbef9a
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 23 20:00:40 2021 +0200

    Use `Sha1Git` type to explicitly state the kind of identifiers
    
    Previous occurrences of `bytes` and `Sha1` are now correctly using `Sha1Git`.
    Also, some bytes conversion methods were replaced by their counterparts in
    the swh.model.hashutil module.

commit a27ffff67b6b14bf37d153bb9b1d1c2ae63773fc
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 23 19:12:06 2021 +0200

    Add support for sha1 identifiers for origins

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/244/ for more details.

aeviso added inline comments.
swh/provenance/backend.py
72–95

Probably not, but this is part of an ongoing refactoring. It will be improved in the process

201–210

I think this comment is out of place. What empty list do you mean?

swh/provenance/postgresql/provenancedb_base.py
81

it's actually more cryptic to me... but I guess that's subjective