Both contents and directories should always have an associated date in
the storage. Flattening of a direcory is know explicitly acknowledged
by setting the newly added flag.
Details
- Reviewers
douardda - Group Reviewers
Reviewers - Commits
- rDPROVf4f48923e86e: Add explicit flag for flattenned directories to `ProvenanceStorageInterface`
Diff Detail
- Repository
- rDPROV Provenance database
- Branch
- master
- Lint
Lint Skipped - Unit
Unit Tests Skipped - Build Status
Buildable 25293 Build 39536: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 39535: arc lint + arc unit
Event Timeline
Build is green
Patch application report for D6712 (id=24392)
Could not rebase; Attempt merge onto 6306b44896...
Updating 6306b44..08d780f Fast-forward sql/upgrades/002.sql | 17 +++ swh/provenance/api/serializers.py | 5 +- swh/provenance/api/server.py | 44 +++++-- swh/provenance/interface.py | 42 ++++--- swh/provenance/mongo/backend.py | 48 ++++---- swh/provenance/postgresql/provenance.py | 137 ++++++++++++--------- swh/provenance/provenance.py | 55 +++++++-- swh/provenance/sql/30-schema.sql | 71 +++++------ swh/provenance/tests/test_conflict_resolution.py | 43 ++++--- swh/provenance/tests/test_provenance_storage.py | 37 +++--- .../tests/test_revision_content_layer.py | 4 +- swh/provenance/tests/test_routing_keys.py | 66 ++++++++++ swh/provenance/tests/test_split_ranges.py | 137 +++++++++++++++++++++ 13 files changed, 505 insertions(+), 201 deletions(-) create mode 100644 sql/upgrades/002.sql create mode 100644 swh/provenance/tests/test_routing_keys.py create mode 100644 swh/provenance/tests/test_split_ranges.py
Changes applied before test
commit 08d780fe95174257f6d69b5bd4a04700b3c83671 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Wed Dec 1 13:21:33 2021 +0100 Add explicit flag for flattenned directories to `ProvenanceStorageInterdace` Both contents and directories should always have an associated date in the storage. Flattening of a direcory is know explicitly acknowledged by setting the newly added flag. commit ea4a856c71d00bfd315890a317af38351d5f7ebb Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Nov 29 14:42:11 2021 +0100 Add test for range splitting function commit 247574bec84d471a6e78721d02b115b6017d2f9d Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Nov 29 14:41:28 2021 +0100 Add test for routing key calculation
See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/511/ for more details.
Build is green
Patch application report for D6712 (id=24395)
Could not rebase; Attempt merge onto 6306b44896...
Updating 6306b44..c0f0fcb Fast-forward sql/upgrades/002.sql | 17 +++ swh/provenance/api/serializers.py | 5 +- swh/provenance/api/server.py | 44 +++++-- swh/provenance/interface.py | 42 ++++--- swh/provenance/mongo/backend.py | 48 ++++---- swh/provenance/postgresql/provenance.py | 137 ++++++++++++--------- swh/provenance/provenance.py | 55 +++++++-- swh/provenance/sql/30-schema.sql | 71 +++++------ swh/provenance/tests/test_conflict_resolution.py | 43 ++++--- swh/provenance/tests/test_provenance_storage.py | 37 +++--- .../tests/test_revision_content_layer.py | 4 +- swh/provenance/tests/test_routing_keys.py | 66 ++++++++++ swh/provenance/tests/test_split_ranges.py | 137 +++++++++++++++++++++ 13 files changed, 505 insertions(+), 201 deletions(-) create mode 100644 sql/upgrades/002.sql create mode 100644 swh/provenance/tests/test_routing_keys.py create mode 100644 swh/provenance/tests/test_split_ranges.py
Changes applied before test
commit c0f0fcbb41d331aab8ceb25e6729562501222f78 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Wed Dec 1 13:21:33 2021 +0100 Add explicit flag for flattenned directories to `ProvenanceStorageInterface` Both contents and directories should always have an associated date in the storage. Flattening of a direcory is know explicitly acknowledged by setting the newly added flag. commit ea4a856c71d00bfd315890a317af38351d5f7ebb Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Nov 29 14:42:11 2021 +0100 Add test for range splitting function commit 247574bec84d471a6e78721d02b115b6017d2f9d Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Nov 29 14:41:28 2021 +0100 Add test for routing key calculation
See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/514/ for more details.
Build is green
Patch application report for D6712 (id=24416)
Could not rebase; Attempt merge onto 247574bec8...
Updating 247574b..7651358 Fast-forward sql/upgrades/002.sql | 17 +++ swh/provenance/api/serializers.py | 5 +- swh/provenance/api/server.py | 44 +++++-- swh/provenance/interface.py | 42 ++++--- swh/provenance/mongo/backend.py | 48 ++++---- swh/provenance/postgresql/provenance.py | 137 ++++++++++++--------- swh/provenance/provenance.py | 55 +++++++-- swh/provenance/sql/30-schema.sql | 71 +++++------ swh/provenance/tests/test_conflict_resolution.py | 43 ++++--- swh/provenance/tests/test_provenance_storage.py | 37 +++--- .../tests/test_revision_content_layer.py | 4 +- swh/provenance/tests/test_split_ranges.py | 137 +++++++++++++++++++++ 12 files changed, 439 insertions(+), 201 deletions(-) create mode 100644 sql/upgrades/002.sql create mode 100644 swh/provenance/tests/test_split_ranges.py
Changes applied before test
commit 765135807ee60342f0b9e62d584c5bd46fedb069 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Wed Dec 1 13:21:33 2021 +0100 Add explicit flag for flattenned directories to `ProvenanceStorageInterface` Both contents and directories should always have an associated date in the storage. Flattening of a direcory is know explicitly acknowledged by setting the newly added flag. commit dd1d7aa233a69fa87b16a361b12ae255605c6899 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Nov 29 14:42:11 2021 +0100 Add test for range splitting function
See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/519/ for more details.
It would really nice to have a better explanation of what this flag is added for and why. Why do we add complexity in the code for this? I know there are good reasons for that, but I cannot see them just reading the code or the commit message.
Also there is typo in the actual commit message (know vs now).
Otherwise I'm fine with the code, I just want to understand why it's needed.
swh/provenance/provenance.py | ||
---|---|---|
319 | maybe .get(sha1, False) is a bit better. |
swh/provenance/provenance.py | ||
---|---|---|
319 | Is not the same. The cache might return None because the associated value to the key is None, and not only due to the key not being defined. In either case we want to replace that none for a False |
Build is green
Patch application report for D6712 (id=24717)
Rebasing onto dd1d7aa233...
Current branch diff-target is up to date.
Changes applied before test
commit 6642da0e4c457b3cee814b214c56bd0c90287414 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Wed Dec 1 13:21:33 2021 +0100 Add explicit flag for flattenned directories to `ProvenanceStorageInterface` Both contents and directories should always have an associated date in the storage. Flattening of a directory is now explicitly acknowledged by setting the newly added flag. The idea is to allow to postpone the creation of flat models for directories in the isochrone frontier (the algorithm will be refactored in the commits to come).
See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/534/ for more details.
swh/provenance/provenance.py | ||
---|---|---|
319 | Ah I see indeed the cache in declared as Optional[bool] but I don't see how this can be the case (that we have a None value, that is). Actually I don't see where we fill this cache in this diff. Oh! it looks this is only used in the following revision (which makes it confusing to me). Would have been easier to put the addition of this new cache entry in D6714 (if I understand all this correctly). |
swh/provenance/provenance.py | ||
---|---|---|
319 | If the requested directory is not in the storage, then the cache will hold a None value to avoid querying the storage multiple times. That's why we use Optional[bool] here |
swh/provenance/provenance.py | ||
---|---|---|
319 | I don't get why we would cache a negative result; I mean this directory might very well be known by the provenance index next time we ask for it. I find it very weird to cache an information like "we don't know about this yet". What do I miss here? |
swh/provenance/provenance.py | ||
---|---|---|
319 | Because we want the information to be consistent during the current batch of elements to process. Decisions are take based on these queries and having a different results might lead to wrong decisions. |
swh/provenance/provenance.py | ||
---|---|---|
319 | Humm I really would have preferred not to merge both the "is known" and "is flattened" information in a single (boolean) flag, but meh. At least please document (in the cache declaration in ProvenanceCache) the reason for this value to be an Optional and that None actually means unknown. |
Build is green
Patch application report for D6712 (id=24718)
Rebasing onto dd1d7aa233...
Current branch diff-target is up to date.
Changes applied before test
commit f4f48923e86ef0054642165bcb9ecf4387d70bb8 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Wed Dec 1 13:21:33 2021 +0100 Add explicit flag for flattenned directories to `ProvenanceStorageInterface` Both contents and directories should always have an associated date in the storage. Flattening of a directory is now explicitly acknowledged by setting the newly added flag. The idea is to allow to postpone the creation of flat models for directories in the isochrone frontier (the algorithm will be refactored in the commits to come).
See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/535/ for more details.