Page MenuHomeSoftware Heritage

Use stored SQL functions for content_find_{all,one}() and merge Provenance*DB classes in a single ProvenanceDB
ClosedPublic

Authored by douardda on Jul 21 2021, 3:10 PM.

Details

Summary

instead of generated specific SQL queries in python. This allows to make Provenance*DB classes easily mergeable (following revision).

So now most of the specific logic is handled in sql functions or
templatized sql code (e.g. relation_add()), it makes no sense to keep
two classes.

Depends on D5843

Diff Detail

Repository
rDPROV Provenance database
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build has FAILED

Patch application report for D6015 (id=21727)

Could not rebase; Attempt merge onto 1ae32c0a61...

Updating 1ae32c0..2745d9f
Fast-forward
 swh/provenance/__init__.py                         |  11 +-
 .../{provenancedb_base.py => provenancedb.py}      | 116 ++++---
 .../postgresql/provenancedb_with_path.py           |  75 -----
 .../postgresql/provenancedb_without_path.py        |  66 ----
 swh/provenance/sql/15-flavor.sql                   |   4 +-
 swh/provenance/sql/30-schema.sql                   |  16 +
 swh/provenance/sql/40-funcs.sql                    | 338 +++++++++++++++++++++
 swh/provenance/sql/60-indexes.sql                  |   9 +
 swh/provenance/tests/conftest.py                   |  22 +-
 swh/provenance/tests/test_provenance_db.py         |  22 +-
 10 files changed, 474 insertions(+), 205 deletions(-)
 rename swh/provenance/postgresql/{provenancedb_base.py => provenancedb.py} (77%)
 delete mode 100644 swh/provenance/postgresql/provenancedb_with_path.py
 delete mode 100644 swh/provenance/postgresql/provenancedb_without_path.py
 create mode 100644 swh/provenance/sql/40-funcs.sql
Changes applied before test
commit 2745d9f6a84038b7947ad2bd81e040835531b501
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jul 21 11:31:50 2021 +0200

    Merge Provenance*DB classes in a single ProvenanceDB
    
    now most of the specific logic is handled in sql functions or
    templatized sql code (e.g. relation_add()), it makes no sense to keep
    two classes.

commit 4cf268f005f4e4141e0988bff8fc4640322d6573
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jun 25 15:02:46 2021 +0200

    Use stored SQL functions for content_find_{all,one}()
    
    instead of generated specific SQL queries in python. This allows to make Provenance*DB classes easily mergeable (following revision).

commit 2f454a94987419e52f595bf5ff9ddf4a15b0cf9c
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jun 16 14:35:50 2021 +0200

    Add support for a denormalized version of the provenance DB
    
    in db schema, relation tables (xxx_in_yyy) are denormalized, meaning the
    yyy relation (and the location, if any) are stored as arrays.
    
    Denormalized schema is chosen at db creation time using one of the 2
    "-denormalized" flavors (aka "with-path-denormalized" or
    "without-path-denormalized").

Link to build: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/264/
See console output for more information: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/264/console

Harbormaster returned this revision to the author for changes because remote builds failed.Jul 21 2021, 3:32 PM
Harbormaster failed remote builds in B22695: Diff 21727!

Build has FAILED

Patch application report for D6015 (id=21734)

Could not rebase; Attempt merge onto 1ae32c0a61...

Updating 1ae32c0..3bc6ad6
Fast-forward
 swh/provenance/__init__.py                         |  11 +-
 .../{provenancedb_base.py => provenancedb.py}      | 116 ++++---
 .../postgresql/provenancedb_with_path.py           |  75 -----
 .../postgresql/provenancedb_without_path.py        |  66 ----
 swh/provenance/sql/15-flavor.sql                   |   4 +-
 swh/provenance/sql/30-schema.sql                   |  16 +
 swh/provenance/sql/40-funcs.sql                    | 338 +++++++++++++++++++++
 swh/provenance/sql/60-indexes.sql                  |   9 +
 swh/provenance/tests/conftest.py                   |  22 +-
 swh/provenance/tests/test_provenance_db.py         |  22 +-
 10 files changed, 474 insertions(+), 205 deletions(-)
 rename swh/provenance/postgresql/{provenancedb_base.py => provenancedb.py} (77%)
 delete mode 100644 swh/provenance/postgresql/provenancedb_with_path.py
 delete mode 100644 swh/provenance/postgresql/provenancedb_without_path.py
 create mode 100644 swh/provenance/sql/40-funcs.sql
Changes applied before test
commit 3bc6ad6dbbfe42044d6985f2b50c2788bc5924be
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jul 21 11:31:50 2021 +0200

    Merge Provenance*DB classes in a single ProvenanceDB
    
    now most of the specific logic is handled in sql functions or
    templatized sql code (e.g. relation_add()), it makes no sense to keep
    two classes.

commit e9c43cdc2847e92ed497e38a84bc142444de19e8
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jun 25 15:02:46 2021 +0200

    Use stored SQL functions for content_find_{all,one}()
    
    instead of generated specific SQL queries in python. This allows to make Provenance*DB classes easily mergeable (following revision).

commit 2f454a94987419e52f595bf5ff9ddf4a15b0cf9c
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jun 16 14:35:50 2021 +0200

    Add support for a denormalized version of the provenance DB
    
    in db schema, relation tables (xxx_in_yyy) are denormalized, meaning the
    yyy relation (and the location, if any) are stored as arrays.
    
    Denormalized schema is chosen at db creation time using one of the 2
    "-denormalized" flavors (aka "with-path-denormalized" or
    "without-path-denormalized").

Link to build: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/265/
See console output for more information: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/265/console

Harbormaster returned this revision to the author for changes because remote builds failed.Jul 22 2021, 9:45 AM
Harbormaster failed remote builds in B22702: Diff 21734!

more fixed in sql functions

Build is green

Patch application report for D6015 (id=21748)

Could not rebase; Attempt merge onto 1ae32c0a61...

Updating 1ae32c0..f74c039
Fast-forward
 swh/provenance/__init__.py                         |  11 +-
 .../{provenancedb_base.py => provenancedb.py}      | 116 ++++---
 .../postgresql/provenancedb_with_path.py           |  75 -----
 .../postgresql/provenancedb_without_path.py        |  66 ----
 swh/provenance/sql/15-flavor.sql                   |   4 +-
 swh/provenance/sql/30-schema.sql                   |  16 +
 swh/provenance/sql/40-funcs.sql                    | 338 +++++++++++++++++++++
 swh/provenance/sql/60-indexes.sql                  |   9 +
 swh/provenance/tests/conftest.py                   |  22 +-
 swh/provenance/tests/test_provenance_db.py         |  22 +-
 10 files changed, 474 insertions(+), 205 deletions(-)
 rename swh/provenance/postgresql/{provenancedb_base.py => provenancedb.py} (77%)
 delete mode 100644 swh/provenance/postgresql/provenancedb_with_path.py
 delete mode 100644 swh/provenance/postgresql/provenancedb_without_path.py
 create mode 100644 swh/provenance/sql/40-funcs.sql
Changes applied before test
commit f74c039e332cfbe3edac13b3b5720b813e4723b4
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jul 21 11:31:50 2021 +0200

    Merge Provenance*DB classes in a single ProvenanceDB
    
    now most of the specific logic is handled in sql functions or
    templatized sql code (e.g. relation_add()), it makes no sense to keep
    two classes.

commit fffdc1e7f22b94f684fc3fc15476e68fdc665f7d
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jun 25 15:02:46 2021 +0200

    Use stored SQL functions for content_find_{all,one}()
    
    instead of generated specific SQL queries in python. This allows to make Provenance*DB classes easily mergeable (following revision).

commit 2f454a94987419e52f595bf5ff9ddf4a15b0cf9c
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jun 16 14:35:50 2021 +0200

    Add support for a denormalized version of the provenance DB
    
    in db schema, relation tables (xxx_in_yyy) are denormalized, meaning the
    yyy relation (and the location, if any) are stored as arrays.
    
    Denormalized schema is chosen at db creation time using one of the 2
    "-denormalized" flavors (aka "with-path-denormalized" or
    "without-path-denormalized").

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/266/ for more details.

I would have loved to also replace the logic in relation_add() and _relation_get() by stored SQL functions, but it's above my poor SQL skills...

vlorentz added a subscriber: vlorentz.

much better!

This revision is now accepted and ready to land.Jul 26 2021, 2:33 PM
aeviso requested changes to this revision.Jul 27 2021, 12:43 PM
aeviso added a subscriber: aeviso.
aeviso added inline comments.
swh/provenance/postgresql/provenancedb.py
114

Capitalization on SQL strings: select -> SELECT, from -> FROM

122

Same as above

This revision now requires changes to proceed.Jul 27 2021, 12:43 PM
swh/provenance/postgresql/provenancedb.py
57–58

Why moving this up here? It was sorted in a way it was easy to find before

57–58

Also, I don't think this is working as before. This is used for inserting/selecting on relations and path field should be skipped for the without-path flavor (see relation_add and _relation_get methods).

swh/provenance/postgresql/provenancedb.py
57–58

Why moving this up here? It was sorted in a way it was easy to find before

To put it near other "feature flags" methods/properties (with-path and denormalized).

57–58

Also, I don't think this is working as before. This is used for inserting/selecting on relations and path field should be skipped for the without-path flavor (see relation_add and _relation_get methods).

old code was:

class ProvenanceWithoutPathDB(ProvenanceDBBase):
    def _relation_uses_location_table(self, relation: RelationType) -> bool:
        return False

class ProvenanceWithPathDB(ProvenanceDBBase):
    def _relation_uses_location_table(self, relation: RelationType) -> bool:
        src, *_ = relation.value.split("_")
        return src in ("content", "directory")

Looks exactly the same to me:

if with_path is False, returns False (as ProvenanceWithoutPathDB), otherwise, it's the exact same logic as ProvenanceWithPathDB.

Also note that the fact tests do pass just fine (without any modification in the code of the tests besides the test_provenance_flavor test is rather good clue that this code is still correct).

rebase and cpitalize sql queries

Build is green

Patch application report for D6015 (id=21796)

Could not rebase; Attempt merge onto 1ae32c0a61...

Updating 1ae32c0..6d9120c
Fast-forward
 swh/provenance/__init__.py                         |  11 +-
 .../{provenancedb_base.py => provenancedb.py}      | 116 ++++---
 .../postgresql/provenancedb_with_path.py           |  75 -----
 .../postgresql/provenancedb_without_path.py        |  66 ----
 swh/provenance/sql/15-flavor.sql                   |   4 +-
 swh/provenance/sql/30-schema.sql                   |  16 +
 swh/provenance/sql/40-funcs.sql                    | 338 +++++++++++++++++++++
 swh/provenance/sql/60-indexes.sql                  |   9 +
 swh/provenance/tests/conftest.py                   |  22 +-
 swh/provenance/tests/test_provenance_db.py         |  22 +-
 10 files changed, 474 insertions(+), 205 deletions(-)
 rename swh/provenance/postgresql/{provenancedb_base.py => provenancedb.py} (77%)
 delete mode 100644 swh/provenance/postgresql/provenancedb_with_path.py
 delete mode 100644 swh/provenance/postgresql/provenancedb_without_path.py
 create mode 100644 swh/provenance/sql/40-funcs.sql
Changes applied before test
commit 6d9120c593a87ed5e78f17c86aba3622df59b700
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jul 21 11:31:50 2021 +0200

    Merge Provenance*DB classes in a single ProvenanceDB
    
    now most of the specific logic is handled in sql functions or
    templatized sql code (e.g. relation_add()), it makes no sense to keep
    two classes.

commit 72b3c87c3c40776b1d78d2982461a55084e01c1c
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jun 25 15:02:46 2021 +0200

    Use stored SQL functions for content_find_{all,one}()
    
    instead of generated specific SQL queries in python. This allows to make Provenance*DB classes easily mergeable (following revision).

commit 0d3aaffcc4b38d2aca65f85fed4d9b95f119a47a
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jun 16 14:35:50 2021 +0200

    Add support for a denormalized version of the provenance DB
    
    in db schema, relation tables (xxx_in_yyy) are denormalized, meaning the
    yyy relation (and the location, if any) are stored as arrays.
    
    Denormalized schema is chosen at db creation time using one of the 2
    "-denormalized" flavors (aka "with-path-denormalized" or
    "without-path-denormalized").

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/269/ for more details.

aeviso requested changes to this revision.Jul 28 2021, 11:41 AM
aeviso added inline comments.
swh/provenance/postgresql/provenancedb.py
57–58

The problem is when the function returns True for the without-path flavor. Then relation_add will insert in location (which shouldn't be there for that flavor in the first place) and _relation_get will return the paths. So I guess there are two options here, either:
1- the without-path flavor is not doing what it should (ie. not storing paths); or
2- the tests are not properly designed.

In either case I still believe the logic is not the same as before, and should be revisited.

57–58

Also, I still believe the function should be in it's original place to make it easier to find ("private" methods are at the end of the class definition sorted in alphabetic order).

This revision now requires changes to proceed.Jul 28 2021, 11:41 AM
swh/provenance/postgresql/provenancedb.py
57–58

I dont't understand what you say. The propsoed code is:

def _relation_uses_location_table(self, relation: RelationType) -> bool:
    if self.with_path():
         src = relation.value.split("_")[0]
         return src in ("content", "directory")
    return False

so:

  • if self.with_path() is False (i.e. the without-path flavor), the the method returns False.
  • if self.with_path() is True (i.e. the with-path flavor), then the method returns True if the source of the relation is either content or directory, False otherwise.

What is wrong with this logic?

This

The problem is when the function returns True for the without-path flavor.

assertion of yours looks incorrect to me. It never returns True for the without-path flavor.

Or do I miss something?

move _relation_uses_location_table at the end of the class

because why not

Build is green

Patch application report for D6015 (id=21829)

Could not rebase; Attempt merge onto 509280132c...

Removing swh/provenance/postgresql/provenancedb_without_path.py
Removing swh/provenance/postgresql/provenancedb_with_path.py
Merge made by the 'recursive' strategy.
 swh/provenance/__init__.py                         |  11 +-
 .../{provenancedb_base.py => provenancedb.py}      | 114 ++++---
 .../postgresql/provenancedb_with_path.py           |  75 -----
 .../postgresql/provenancedb_without_path.py        |  66 ----
 swh/provenance/sql/15-flavor.sql                   |   4 +-
 swh/provenance/sql/30-schema.sql                   |  16 +
 swh/provenance/sql/40-funcs.sql                    | 338 +++++++++++++++++++++
 swh/provenance/sql/60-indexes.sql                  |   9 +
 swh/provenance/tests/conftest.py                   |  22 +-
 swh/provenance/tests/test_provenance_db.py         |  22 +-
 10 files changed, 473 insertions(+), 204 deletions(-)
 rename swh/provenance/postgresql/{provenancedb_base.py => provenancedb.py} (77%)
 delete mode 100644 swh/provenance/postgresql/provenancedb_with_path.py
 delete mode 100644 swh/provenance/postgresql/provenancedb_without_path.py
 create mode 100644 swh/provenance/sql/40-funcs.sql
Changes applied before test
commit bf9433fac70c57bc9d28c10ce8275245e9fa507e
Merge: 5092801 a2ce4a3
Author: Jenkins user <jenkins@localhost>
Date:   Wed Jul 28 12:40:59 2021 +0000

    Merge branch 'diff-target' into HEAD

commit a2ce4a30aaba0c1a9110a794d01a50c1aa3cd2c3
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jul 21 11:31:50 2021 +0200

    Merge Provenance*DB classes in a single ProvenanceDB
    
    now most of the specific logic is handled in sql functions or
    templatized sql code (e.g. relation_add()), it makes no sense to keep
    two classes.

commit 72b3c87c3c40776b1d78d2982461a55084e01c1c
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jun 25 15:02:46 2021 +0200

    Use stored SQL functions for content_find_{all,one}()
    
    instead of generated specific SQL queries in python. This allows to make Provenance*DB classes easily mergeable (following revision).

commit 0d3aaffcc4b38d2aca65f85fed4d9b95f119a47a
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jun 16 14:35:50 2021 +0200

    Add support for a denormalized version of the provenance DB
    
    in db schema, relation tables (xxx_in_yyy) are denormalized, meaning the
    yyy relation (and the location, if any) are stored as arrays.
    
    Denormalized schema is chosen at db creation time using one of the 2
    "-denormalized" flavors (aka "with-path-denormalized" or
    "without-path-denormalized").

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/273/ for more details.

Build is green

Patch application report for D6015 (id=21832)

Could not rebase; Attempt merge onto 509280132c...

Updating 5092801..f5e6c28
Fast-forward
 swh/provenance/__init__.py                         |  11 +-
 .../{provenancedb_base.py => provenancedb.py}      | 114 ++++---
 .../postgresql/provenancedb_with_path.py           |  75 -----
 .../postgresql/provenancedb_without_path.py        |  66 ----
 swh/provenance/sql/15-flavor.sql                   |   4 +-
 swh/provenance/sql/30-schema.sql                   |  16 +
 swh/provenance/sql/40-funcs.sql                    | 338 +++++++++++++++++++++
 swh/provenance/sql/60-indexes.sql                  |   9 +
 swh/provenance/tests/conftest.py                   |  22 +-
 swh/provenance/tests/test_provenance_db.py         |  22 +-
 10 files changed, 473 insertions(+), 204 deletions(-)
 rename swh/provenance/postgresql/{provenancedb_base.py => provenancedb.py} (77%)
 delete mode 100644 swh/provenance/postgresql/provenancedb_with_path.py
 delete mode 100644 swh/provenance/postgresql/provenancedb_without_path.py
 create mode 100644 swh/provenance/sql/40-funcs.sql
Changes applied before test
commit f5e6c283b08eb2f1efd6d598daca250b0e89dbcf
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jul 21 11:31:50 2021 +0200

    Merge Provenance*DB classes in a single ProvenanceDB
    
    now most of the specific logic is handled in sql functions or
    templatized sql code (e.g. relation_add()), it makes no sense to keep
    two classes.

commit fbc5499eb0e2e77d9f4650599778a8d8cd86f982
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jun 25 15:02:46 2021 +0200

    Use stored SQL functions for content_find_{all,one}()
    
    instead of generated specific SQL queries in python. This allows to make Provenance*DB classes easily mergeable (following revision).

commit 1c3d6426ebd2d1e4b00a50888b2b3eead5b8eab3
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jun 16 14:35:50 2021 +0200

    Add support for a denormalized version of the provenance DB
    
    in db schema, relation tables (xxx_in_yyy) are denormalized, meaning the
    yyy relation (and the location, if any) are stored as arrays.
    
    Denormalized schema is chosen at db creation time using one of the 2
    "-denormalized" flavors (aka "with-path-denormalized" or
    "without-path-denormalized").

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/276/ for more details.

aeviso added inline comments.
swh/provenance/postgresql/provenancedb.py
57–58

Right, I got confused by how the diff displayed the changes and I was thinking about the with-path implementation only

This revision is now accepted and ready to land.Jul 30 2021, 11:30 AM