Page MenuHomeSoftware Heritage

Add explicit flag for flattenned directories to `ProvenanceStorageInterface`
ClosedPublic

Authored by aeviso on Dec 1 2021, 3:53 PM.

Details

Summary

Both contents and directories should always have an associated date in
the storage. Flattening of a direcory is know explicitly acknowledged
by setting the newly added flag.

Diff Detail

Event Timeline

aeviso retitled this revision from Add explicit flag for flattenned directories to `ProvenanceStorageInterdace` to Add explicit flag for flattenned directories to `ProvenanceStorageInterface`.Dec 1 2021, 3:58 PM

Build is green

Patch application report for D6712 (id=24392)

Could not rebase; Attempt merge onto 6306b44896...

Updating 6306b44..08d780f
Fast-forward
 sql/upgrades/002.sql                               |  17 +++
 swh/provenance/api/serializers.py                  |   5 +-
 swh/provenance/api/server.py                       |  44 +++++--
 swh/provenance/interface.py                        |  42 ++++---
 swh/provenance/mongo/backend.py                    |  48 ++++----
 swh/provenance/postgresql/provenance.py            | 137 ++++++++++++---------
 swh/provenance/provenance.py                       |  55 +++++++--
 swh/provenance/sql/30-schema.sql                   |  71 +++++------
 swh/provenance/tests/test_conflict_resolution.py   |  43 ++++---
 swh/provenance/tests/test_provenance_storage.py    |  37 +++---
 .../tests/test_revision_content_layer.py           |   4 +-
 swh/provenance/tests/test_routing_keys.py          |  66 ++++++++++
 swh/provenance/tests/test_split_ranges.py          | 137 +++++++++++++++++++++
 13 files changed, 505 insertions(+), 201 deletions(-)
 create mode 100644 sql/upgrades/002.sql
 create mode 100644 swh/provenance/tests/test_routing_keys.py
 create mode 100644 swh/provenance/tests/test_split_ranges.py
Changes applied before test
commit 08d780fe95174257f6d69b5bd4a04700b3c83671
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Dec 1 13:21:33 2021 +0100

    Add explicit flag for flattenned directories to `ProvenanceStorageInterdace`
    
    Both contents and directories should always have an associated date in
    the storage. Flattening of a direcory is know explicitly acknowledged
    by setting the newly added flag.

commit ea4a856c71d00bfd315890a317af38351d5f7ebb
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Nov 29 14:42:11 2021 +0100

    Add test for range splitting function

commit 247574bec84d471a6e78721d02b115b6017d2f9d
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Nov 29 14:41:28 2021 +0100

    Add test for routing key calculation

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/511/ for more details.

Build is green

Patch application report for D6712 (id=24395)

Could not rebase; Attempt merge onto 6306b44896...

Updating 6306b44..c0f0fcb
Fast-forward
 sql/upgrades/002.sql                               |  17 +++
 swh/provenance/api/serializers.py                  |   5 +-
 swh/provenance/api/server.py                       |  44 +++++--
 swh/provenance/interface.py                        |  42 ++++---
 swh/provenance/mongo/backend.py                    |  48 ++++----
 swh/provenance/postgresql/provenance.py            | 137 ++++++++++++---------
 swh/provenance/provenance.py                       |  55 +++++++--
 swh/provenance/sql/30-schema.sql                   |  71 +++++------
 swh/provenance/tests/test_conflict_resolution.py   |  43 ++++---
 swh/provenance/tests/test_provenance_storage.py    |  37 +++---
 .../tests/test_revision_content_layer.py           |   4 +-
 swh/provenance/tests/test_routing_keys.py          |  66 ++++++++++
 swh/provenance/tests/test_split_ranges.py          | 137 +++++++++++++++++++++
 13 files changed, 505 insertions(+), 201 deletions(-)
 create mode 100644 sql/upgrades/002.sql
 create mode 100644 swh/provenance/tests/test_routing_keys.py
 create mode 100644 swh/provenance/tests/test_split_ranges.py
Changes applied before test
commit c0f0fcbb41d331aab8ceb25e6729562501222f78
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Dec 1 13:21:33 2021 +0100

    Add explicit flag for flattenned directories to `ProvenanceStorageInterface`
    
    Both contents and directories should always have an associated date in
    the storage. Flattening of a direcory is know explicitly acknowledged
    by setting the newly added flag.

commit ea4a856c71d00bfd315890a317af38351d5f7ebb
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Nov 29 14:42:11 2021 +0100

    Add test for range splitting function

commit 247574bec84d471a6e78721d02b115b6017d2f9d
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Nov 29 14:41:28 2021 +0100

    Add test for routing key calculation

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/514/ for more details.

aeviso requested review of this revision.Dec 1 2021, 4:10 PM

Build is green

Patch application report for D6712 (id=24416)

Could not rebase; Attempt merge onto 247574bec8...

Updating 247574b..7651358
Fast-forward
 sql/upgrades/002.sql                               |  17 +++
 swh/provenance/api/serializers.py                  |   5 +-
 swh/provenance/api/server.py                       |  44 +++++--
 swh/provenance/interface.py                        |  42 ++++---
 swh/provenance/mongo/backend.py                    |  48 ++++----
 swh/provenance/postgresql/provenance.py            | 137 ++++++++++++---------
 swh/provenance/provenance.py                       |  55 +++++++--
 swh/provenance/sql/30-schema.sql                   |  71 +++++------
 swh/provenance/tests/test_conflict_resolution.py   |  43 ++++---
 swh/provenance/tests/test_provenance_storage.py    |  37 +++---
 .../tests/test_revision_content_layer.py           |   4 +-
 swh/provenance/tests/test_split_ranges.py          | 137 +++++++++++++++++++++
 12 files changed, 439 insertions(+), 201 deletions(-)
 create mode 100644 sql/upgrades/002.sql
 create mode 100644 swh/provenance/tests/test_split_ranges.py
Changes applied before test
commit 765135807ee60342f0b9e62d584c5bd46fedb069
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Dec 1 13:21:33 2021 +0100

    Add explicit flag for flattenned directories to `ProvenanceStorageInterface`
    
    Both contents and directories should always have an associated date in
    the storage. Flattening of a direcory is know explicitly acknowledged
    by setting the newly added flag.

commit dd1d7aa233a69fa87b16a361b12ae255605c6899
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Nov 29 14:42:11 2021 +0100

    Add test for range splitting function

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/519/ for more details.

douardda added a subscriber: douardda.

It would really nice to have a better explanation of what this flag is added for and why. Why do we add complexity in the code for this? I know there are good reasons for that, but I cannot see them just reading the code or the commit message.

Also there is typo in the actual commit message (know vs now).

Otherwise I'm fine with the code, I just want to understand why it's needed.

swh/provenance/provenance.py
318

maybe .get(sha1, False) is a bit better.

This revision now requires changes to proceed.Dec 10 2021, 3:48 PM
swh/provenance/provenance.py
318

Is not the same. The cache might return None because the associated value to the key is None, and not only due to the key not being defined. In either case we want to replace that none for a False

Build is green

Patch application report for D6712 (id=24717)

Rebasing onto dd1d7aa233...

Current branch diff-target is up to date.
Changes applied before test
commit 6642da0e4c457b3cee814b214c56bd0c90287414
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Dec 1 13:21:33 2021 +0100

    Add explicit flag for flattenned directories to `ProvenanceStorageInterface`
    
    Both contents and directories should always have an associated date in the storage.
    Flattening of a directory is now explicitly acknowledged by setting the newly added
    flag. The idea is to allow to postpone the creation of flat models for directories
    in the isochrone frontier (the algorithm will be refactored in the commits to come).

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/534/ for more details.

swh/provenance/provenance.py
318

Ah I see indeed the cache in declared as Optional[bool] but I don't see how this can be the case (that we have a None value, that is). Actually I don't see where we fill this cache in this diff.

Oh! it looks this is only used in the following revision (which makes it confusing to me). Would have been easier to put the addition of this new cache entry in D6714 (if I understand all this correctly).

aeviso added inline comments.
swh/provenance/provenance.py
318

If the requested directory is not in the storage, then the cache will hold a None value to avoid querying the storage multiple times. That's why we use Optional[bool] here

swh/provenance/provenance.py
318

I don't get why we would cache a negative result; I mean this directory might very well be known by the provenance index next time we ask for it. I find it very weird to cache an information like "we don't know about this yet".

What do I miss here?

aeviso added inline comments.
swh/provenance/provenance.py
318

Because we want the information to be consistent during the current batch of elements to process. Decisions are take based on these queries and having a different results might lead to wrong decisions.

douardda added inline comments.
swh/provenance/provenance.py
318

Humm I really would have preferred not to merge both the "is known" and "is flattened" information in a single (boolean) flag, but meh. At least please document (in the cache declaration in ProvenanceCache) the reason for this value to be an Optional and that None actually means unknown.

This revision is now accepted and ready to land.Dec 10 2021, 5:23 PM
aeviso marked an inline comment as done.

rebase

Build is green

Patch application report for D6712 (id=24718)

Rebasing onto dd1d7aa233...

Current branch diff-target is up to date.
Changes applied before test
commit f4f48923e86ef0054642165bcb9ecf4387d70bb8
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Dec 1 13:21:33 2021 +0100

    Add explicit flag for flattenned directories to `ProvenanceStorageInterface`
    
    Both contents and directories should always have an associated date in the storage.
    Flattening of a directory is now explicitly acknowledged by setting the newly added
    flag. The idea is to allow to postpone the creation of flat models for directories
    in the isochrone frontier (the algorithm will be refactored in the commits to come).

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/535/ for more details.