Page MenuHomeSoftware Heritage

Use the flavor in the db to choose the ProvenanceDB class to instanciate
ClosedPublic

Authored by douardda on Jun 15 2021, 3:58 PM.

Details

Summary

instead of a config option that may be inconsistent with how the
database has been created/populated.

So now, once the flavor of the provenance DB is chosen, at db creation
time, no need for specifying the "with_path" config option.

Deprecate the "with_path" config option in get_provenance(), but make
sure the given value, if any, is consistent with the database.

Modify the provenance pytest fixture to provide both versions of the
provenance DB (with and without path).

Adapt tests to make them pass for the (previously untested) provenance DB
wihout path flavor.

Depends on D5870.

Diff Detail

Repository
rDPROV Provenance database
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D5872 (id=21042)

Could not rebase; Attempt merge onto 8c536b8d50...

Updating 8c536b8..c9d1369
Fast-forward
 swh/provenance/__init__.py                         |  19 ++-
 swh/provenance/postgresql/provenancedb_base.py     |  79 ++++++-----
 .../postgresql/provenancedb_with_path.py           |  89 ++++--------
 .../postgresql/provenancedb_without_path.py        | 129 ++++++-----------
 swh/provenance/provenance.py                       | 110 ++++++++-------
 swh/provenance/sql/30-schema.sql                   | 149 +++++++++-----------
 swh/provenance/sql/60-indexes.sql                  |  17 +--
 swh/provenance/tests/conftest.py                   |  27 ++--
 swh/provenance/tests/test_cli.py                   |  17 +--
 swh/provenance/tests/test_provenance_db.py         |  11 ++
 swh/provenance/tests/test_provenance_heuristics.py | 155 ++++++++++++---------
 11 files changed, 387 insertions(+), 415 deletions(-)
Changes applied before test
commit c9d1369ba165169a2ffe1d065903c3c3c7b566e1
Author: David Douard <david.douard@sdfa3.org>
Date:   Tue Jun 15 15:03:32 2021 +0200

    Use the flavor in the db to choose the ProvenanceDB class to instanciate
    
    instead of a config option that may be inconsistent with how the
    database has been created/populated.
    
    So now, once the flavor of the provenance DB is chosen, at db creation
    time, no need for specifying the "with_path" config option.
    
    Deprecate the "with_path" config option in get_provenance(), but make
    sure the given value, if any, is consistent with the database.
    
    Modify the provenance pytest fixture to provide both versions of the
    provenance DB (with and without path).
    
    Adapt tests to make them pass for the (previously untested) provenance DB
    wihout path flavor.

commit 8d23dac058f68ae05cc4cbcbd23a512a43723331
Author: David Douard <david.douard@sdfa3.org>
Date:   Tue Jun 15 12:01:15 2021 +0200

    Improve DB schema naming
    
    make it more consistent, so we can then simplify the code a bit by
    making relation table names composable (i.e. the relation table between
    content and revision is now simply content_in_revision, and so on) as
    well as column names (e.g. rev->revision and so on).
    
    Also remove the conditional sql execution in db init code (in
    swh/provennce/sql/), because it is actually unnecessary: keeping the
    `location` column for the "without-path" flavor does not cost anything.
    The only regression is that the location column in entity tables lost
    its "not null" constraint, but it's a minor drawback that the gain in
    clarity and simplicity of the db initialization code make up.

commit e27c7d11f2ce8e979c56cb49c75d0f0940993181
Author: David Douard <david.douard@sdfa3.org>
Date:   Mon Jun 14 16:01:24 2021 +0200

    Refactor the provenanceDB.insert_relation() methods
    
    simplify the code and reduce it to a couple of INSERT queries (one for
    locations if any, one for the destination relation table).

commit ac8dc036bcc608ba39d65929569f13ad694ebd90
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jun 9 10:27:32 2021 +0200

    Refactor the cache handling in ProvenanceDB
    
    - use TypedDict structures to properly type the caches needed by the
      ProvenanceDB objects,
    - use only one (sha1, date) cache per entity, plus a set of added ids of
      objects (within the cache) (i.e. for revisisons, contents and directories).

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/140/ for more details.

This revision is now accepted and ready to land.Jun 15 2021, 6:29 PM