Page MenuHomeSoftware Heritage

Refactor the provenanceDB.insert_relation()
ClosedPublic

Authored by douardda on Jun 10 2021, 10:37 AM.

Details

Summary

simplify the code and reduce it to a couple of INSERT queries (one for
locations, one for the dst_table)

Diff Detail

Repository
rDPROV Provenance database
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D5842 (id=20895)

Could not rebase; Attempt merge onto 6cdd424eba...

Updating 6cdd424..ac1b33b
Fast-forward
 swh/provenance/__init__.py                         |  18 +-
 swh/provenance/postgresql/provenancedb.py          | 392 +++++++++++++++++++++
 swh/provenance/postgresql/provenancedb_base.py     | 325 -----------------
 .../postgresql/provenancedb_with_path.py           | 157 ---------
 .../postgresql/provenancedb_without_path.py        | 140 --------
 swh/provenance/provenance.py                       |   3 +-
 swh/provenance/sql/15-flavor.sql                   |  21 --
 swh/provenance/sql/30-schema.sql                   |  25 +-
 swh/provenance/sql/60-indexes.sql                  |   7 -
 swh/provenance/tests/conftest.py                   |   4 +-
 swh/provenance/tests/test_cli.py                   |  33 +-
 11 files changed, 410 insertions(+), 715 deletions(-)
 create mode 100644 swh/provenance/postgresql/provenancedb.py
 delete mode 100644 swh/provenance/postgresql/provenancedb_base.py
 delete mode 100644 swh/provenance/postgresql/provenancedb_with_path.py
 delete mode 100644 swh/provenance/postgresql/provenancedb_without_path.py
 delete mode 100644 swh/provenance/sql/15-flavor.sql
Changes applied before test
commit ac1b33b66ebccff3d5e2a2280e1c7446e8fa087a
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jun 10 10:26:38 2021 +0200

    Simplify the ProvenanceDB.insert_all() method
    
    factorize insertions in content, revision and directory tables.

commit 6b2b97ac23fe43146d4964a56806d5ce9f726c06
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jun 10 09:13:59 2021 +0200

    Refactor the provenanceDB.insert_location() method
    
    simplify the code and reduce it to a couple of INSERT queries (one for
    locations, one for the dst_table).

commit fe35120741d76ff4d91d82bd1db029ff90ce8d60
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jun 9 14:55:54 2021 +0200

    Remove the without-path flavor of ProvenanceDB

commit e23832b21ad4ee7afcb56f98147e51f633b6c2d7
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jun 9 10:27:32 2021 +0200

    Refactor the cache handling in ProvenanceDB
    
    - use TypedDict structures to properly type the caches needed by the
      ProvenanceDB objects,
    - use only one cache plus a set of added (and eventually removed) ids of
      objects (within the cache) for revisisons, contents and directories.

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/103/ for more details.

vlorentz added inline comments.
swh/provenance/postgresql/provenancedb.py
383–391 ↗(On Diff #20895)

can you error if any of the table names are not in a hardcoded set?

I prefer to always double-check values just before substituting in SQL queries.

Build is green

Patch application report for D5842 (id=20923)

Could not rebase; Attempt merge onto 075b0d6cd6...

Updating 075b0d6..aa3b89d
Fast-forward
 swh/provenance/__init__.py                         |  18 +-
 swh/provenance/postgresql/provenancedb.py          | 392 +++++++++++++++++++++
 swh/provenance/postgresql/provenancedb_base.py     | 325 -----------------
 .../postgresql/provenancedb_with_path.py           | 157 ---------
 .../postgresql/provenancedb_without_path.py        | 140 --------
 swh/provenance/provenance.py                       |   3 +-
 swh/provenance/sql/15-flavor.sql                   |  21 --
 swh/provenance/sql/30-schema.sql                   |  25 +-
 swh/provenance/sql/60-indexes.sql                  |   7 -
 swh/provenance/tests/conftest.py                   |   4 +-
 swh/provenance/tests/test_cli.py                   |  33 +-
 11 files changed, 410 insertions(+), 715 deletions(-)
 create mode 100644 swh/provenance/postgresql/provenancedb.py
 delete mode 100644 swh/provenance/postgresql/provenancedb_base.py
 delete mode 100644 swh/provenance/postgresql/provenancedb_with_path.py
 delete mode 100644 swh/provenance/postgresql/provenancedb_without_path.py
 delete mode 100644 swh/provenance/sql/15-flavor.sql
Changes applied before test
commit aa3b89d7e98462d14d5ce3b13a79c90bb2398adf
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jun 10 10:26:38 2021 +0200

    Simplify the ProvenanceDB.insert_all() method
    
    factorize insertions in content, revision and directory tables.

commit 3e424af6b3d65daedf9f1923d2b214cb57676abe
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jun 10 09:13:59 2021 +0200

    Refactor the provenanceDB.insert_location() method
    
    simplify the code and reduce it to a couple of INSERT queries (one for
    locations, one for the dst_table).

commit 4c50588e85be58c0d17d0e55d3ebb0facc3ee173
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jun 9 14:55:54 2021 +0200

    Remove the without-path flavor of ProvenanceDB

commit 8aff35d251db39537a3a4bd14f98783dc06ebdc9
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jun 9 10:27:32 2021 +0200

    Refactor the cache handling in ProvenanceDB
    
    - use TypedDict structures to properly type the caches needed by the
      ProvenanceDB objects,
    - use only one cache plus a set of added (and eventually removed) ids of
      objects (within the cache) for revisisons, contents and directories.

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/115/ for more details.

swh/provenance/postgresql/provenancedb.py
383–391 ↗(On Diff #20895)

good thinking, thx

rebase and adapt to current master

douardda retitled this revision from Refactor the provenanceDB.insert_location() and insert_all() methods to Refactor the provenanceDB.insert_relation().
douardda edited the summary of this revision. (Show Details)

Build is green

Patch application report for D5842 (id=21022)

Could not rebase; Attempt merge onto 206399eb8a...

Updating 206399e..c4eaa2d
Fast-forward
 swh/provenance/postgresql/provenancedb_base.py     |  12 ++-
 .../postgresql/provenancedb_with_path.py           |  54 ++---------
 .../postgresql/provenancedb_without_path.py        |  43 ++-------
 swh/provenance/provenance.py                       | 106 +++++++++++----------
 4 files changed, 86 insertions(+), 129 deletions(-)
Changes applied before test
commit c4eaa2d6c8f8762d20921c97d0ca01b54d5d81fa
Author: David Douard <david.douard@sdfa3.org>
Date:   Mon Jun 14 16:01:24 2021 +0200

    Refactor the provenanceDB.insert_relation() methods
    
    simplify the code and reduce it to a couple of INSERT queries (one for
    locations if any, one for the destination relation table).

commit 29673033e93e423ccd39cca46731c57faa22b02a
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jun 9 10:27:32 2021 +0200

    Refactor the cache handling in ProvenanceDB
    
    - use TypedDict structures to properly type the caches needed by the
      ProvenanceDB objects,
    - use only one (sha1, date) cache per entity, plus a set of added ids of
      objects (within the cache) (i.e. for revisisons, contents and directories).

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/133/ for more details.

Build is green

Patch application report for D5842 (id=21035)

Could not rebase; Attempt merge onto 8c536b8d50...

Updating 8c536b8..e27c7d1
Fast-forward
 swh/provenance/postgresql/provenancedb_base.py     |  12 ++-
 .../postgresql/provenancedb_with_path.py           |  54 ++---------
 .../postgresql/provenancedb_without_path.py        |  43 ++-------
 swh/provenance/provenance.py                       | 106 +++++++++++----------
 4 files changed, 86 insertions(+), 129 deletions(-)
Changes applied before test
commit e27c7d11f2ce8e979c56cb49c75d0f0940993181
Author: David Douard <david.douard@sdfa3.org>
Date:   Mon Jun 14 16:01:24 2021 +0200

    Refactor the provenanceDB.insert_relation() methods
    
    simplify the code and reduce it to a couple of INSERT queries (one for
    locations if any, one for the destination relation table).

commit ac8dc036bcc608ba39d65929569f13ad694ebd90
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jun 9 10:27:32 2021 +0200

    Refactor the cache handling in ProvenanceDB
    
    - use TypedDict structures to properly type the caches needed by the
      ProvenanceDB objects,
    - use only one (sha1, date) cache per entity, plus a set of added ids of
      objects (within the cache) (i.e. for revisisons, contents and directories).

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/137/ for more details.

This revision is now accepted and ready to land.Jun 15 2021, 6:29 PM