Page MenuHomeSoftware Heritage

Add support to flatten directories in the isochrone frontiers separately
ClosedPublic

Authored by aeviso on Dec 1 2021, 3:55 PM.

Details

Summary

Building on the previous commit, a new entry point is added to the module
allowing to iterate over a list of directories that are already identified
as isochrone frontiers in the provenance model, but no flat models for
their content has been created yet. This iteration produces such flat
models.

Depends on D6713.

Diff Detail

Repository
rDPROV Provenance database
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D6714 (id=24394)

Could not rebase; Attempt merge onto 6306b44896...

Updating 6306b44..ed20f92
Fast-forward
 sql/upgrades/002.sql                               |  17 +++
 swh/provenance/api/serializers.py                  |   5 +-
 swh/provenance/api/server.py                       |  44 +++++--
 swh/provenance/cli.py                              |  37 ++++++
 swh/provenance/directory.py                        |  86 +++++++++++++
 swh/provenance/graph.py                            |   4 +-
 swh/provenance/interface.py                        |  54 +++++---
 swh/provenance/mongo/backend.py                    |  48 ++++----
 swh/provenance/origin.py                           |   2 +-
 swh/provenance/postgresql/provenance.py            | 137 ++++++++++++---------
 swh/provenance/provenance.py                       |  70 +++++++++--
 swh/provenance/revision.py                         |  34 +----
 swh/provenance/sql/30-schema.sql                   |  71 +++++------
 swh/provenance/tests/test_cli.py                   |   1 +
 swh/provenance/tests/test_conflict_resolution.py   |  43 ++++---
 swh/provenance/tests/test_directory_flatten.py     |  72 +++++++++++
 swh/provenance/tests/test_directory_iterator.py    |  29 +++++
 swh/provenance/tests/test_history_graph.py         |   2 +-
 swh/provenance/tests/test_isochrone_graph.py       |   2 +-
 swh/provenance/tests/test_provenance_storage.py    |  39 +++---
 .../tests/test_revision_content_layer.py           |   7 +-
 swh/provenance/tests/test_routing_keys.py          |  66 ++++++++++
 swh/provenance/tests/test_split_ranges.py          | 137 +++++++++++++++++++++
 23 files changed, 769 insertions(+), 238 deletions(-)
 create mode 100644 sql/upgrades/002.sql
 create mode 100644 swh/provenance/directory.py
 create mode 100644 swh/provenance/tests/test_directory_flatten.py
 create mode 100644 swh/provenance/tests/test_directory_iterator.py
 create mode 100644 swh/provenance/tests/test_routing_keys.py
 create mode 100644 swh/provenance/tests/test_split_ranges.py
Changes applied before test
commit ed20f92d90c35baee37bb35745a562c8d1f8affc
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Dec 1 15:45:12 2021 +0100

    Add support to flatten directories in the isochrone frontiers separately
    
    Building on the previous commit, a new entry point is added to the module
    allowing to iterate over a list of directories that are already identified
    as isochrone frontiers in the provenance model, but no flat models for
    their content has been created yet. This iteration produces such flat
    models.

commit ae96d01f62bc0f4319af6c79a1f1989aa795cc1b
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Dec 1 15:00:28 2021 +0100

    Unify parameter order between provenance and archive objects along the module

commit 08d780fe95174257f6d69b5bd4a04700b3c83671
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Dec 1 13:21:33 2021 +0100

    Add explicit flag for flattenned directories to `ProvenanceStorageInterdace`
    
    Both contents and directories should always have an associated date in
    the storage. Flattening of a direcory is know explicitly acknowledged
    by setting the newly added flag.

commit ea4a856c71d00bfd315890a317af38351d5f7ebb
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Nov 29 14:42:11 2021 +0100

    Add test for range splitting function

commit 247574bec84d471a6e78721d02b115b6017d2f9d
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Nov 29 14:41:28 2021 +0100

    Add test for routing key calculation

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/513/ for more details.

Build is green

Patch application report for D6714 (id=24397)

Could not rebase; Attempt merge onto 6306b44896...

Updating 6306b44..7ffb899
Fast-forward
 sql/upgrades/002.sql                               |  17 +++
 swh/provenance/api/serializers.py                  |   5 +-
 swh/provenance/api/server.py                       |  44 +++++--
 swh/provenance/cli.py                              |  37 ++++++
 swh/provenance/directory.py                        |  86 +++++++++++++
 swh/provenance/graph.py                            |   4 +-
 swh/provenance/interface.py                        |  54 +++++---
 swh/provenance/mongo/backend.py                    |  48 ++++----
 swh/provenance/origin.py                           |   2 +-
 swh/provenance/postgresql/provenance.py            | 137 ++++++++++++---------
 swh/provenance/provenance.py                       |  70 +++++++++--
 swh/provenance/revision.py                         |  34 +----
 swh/provenance/sql/30-schema.sql                   |  71 +++++------
 swh/provenance/tests/test_cli.py                   |   1 +
 swh/provenance/tests/test_conflict_resolution.py   |  43 ++++---
 swh/provenance/tests/test_directory_flatten.py     |  72 +++++++++++
 swh/provenance/tests/test_directory_iterator.py    |  29 +++++
 swh/provenance/tests/test_history_graph.py         |   2 +-
 swh/provenance/tests/test_isochrone_graph.py       |   2 +-
 swh/provenance/tests/test_provenance_storage.py    |  39 +++---
 .../tests/test_revision_content_layer.py           |   7 +-
 swh/provenance/tests/test_routing_keys.py          |  66 ++++++++++
 swh/provenance/tests/test_split_ranges.py          | 137 +++++++++++++++++++++
 23 files changed, 769 insertions(+), 238 deletions(-)
 create mode 100644 sql/upgrades/002.sql
 create mode 100644 swh/provenance/directory.py
 create mode 100644 swh/provenance/tests/test_directory_flatten.py
 create mode 100644 swh/provenance/tests/test_directory_iterator.py
 create mode 100644 swh/provenance/tests/test_routing_keys.py
 create mode 100644 swh/provenance/tests/test_split_ranges.py
Changes applied before test
commit 7ffb8994ca685fbc76387dee90d75a35b90d9cde
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Dec 1 15:45:12 2021 +0100

    Add support to flatten directories in the isochrone frontiers separately
    
    Building on the previous commit, a new entry point is added to the module
    allowing to iterate over a list of directories that are already identified
    as isochrone frontiers in the provenance model, but no flat models for
    their content has been created yet. This iteration produces such flat
    models.

commit d3345b815ecca205ac425b5e20869938cecf6182
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Dec 1 15:00:28 2021 +0100

    Unify parameter order between provenance and archive objects across the module

commit c0f0fcbb41d331aab8ceb25e6729562501222f78
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Dec 1 13:21:33 2021 +0100

    Add explicit flag for flattenned directories to `ProvenanceStorageInterface`
    
    Both contents and directories should always have an associated date in
    the storage. Flattening of a direcory is know explicitly acknowledged
    by setting the newly added flag.

commit ea4a856c71d00bfd315890a317af38351d5f7ebb
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Nov 29 14:42:11 2021 +0100

    Add test for range splitting function

commit 247574bec84d471a6e78721d02b115b6017d2f9d
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Nov 29 14:41:28 2021 +0100

    Add test for routing key calculation

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/516/ for more details.

aeviso requested review of this revision.Dec 1 2021, 4:12 PM

Build is green

Patch application report for D6714 (id=24418)

Could not rebase; Attempt merge onto 247574bec8...

Updating 247574b..5448b6e
Fast-forward
 sql/upgrades/002.sql                               |  17 +++
 swh/provenance/api/serializers.py                  |   5 +-
 swh/provenance/api/server.py                       |  44 +++++--
 swh/provenance/cli.py                              |  37 ++++++
 swh/provenance/directory.py                        |  86 +++++++++++++
 swh/provenance/graph.py                            |   4 +-
 swh/provenance/interface.py                        |  54 +++++---
 swh/provenance/mongo/backend.py                    |  48 ++++----
 swh/provenance/origin.py                           |   2 +-
 swh/provenance/postgresql/provenance.py            | 137 ++++++++++++---------
 swh/provenance/provenance.py                       |  70 +++++++++--
 swh/provenance/revision.py                         |  34 +----
 swh/provenance/sql/30-schema.sql                   |  71 +++++------
 swh/provenance/tests/test_cli.py                   |   1 +
 swh/provenance/tests/test_conflict_resolution.py   |  43 ++++---
 swh/provenance/tests/test_directory_flatten.py     |  72 +++++++++++
 swh/provenance/tests/test_directory_iterator.py    |  29 +++++
 swh/provenance/tests/test_history_graph.py         |   2 +-
 swh/provenance/tests/test_isochrone_graph.py       |   2 +-
 swh/provenance/tests/test_provenance_storage.py    |  39 +++---
 .../tests/test_revision_content_layer.py           |   7 +-
 swh/provenance/tests/test_split_ranges.py          | 137 +++++++++++++++++++++
 22 files changed, 703 insertions(+), 238 deletions(-)
 create mode 100644 sql/upgrades/002.sql
 create mode 100644 swh/provenance/directory.py
 create mode 100644 swh/provenance/tests/test_directory_flatten.py
 create mode 100644 swh/provenance/tests/test_directory_iterator.py
 create mode 100644 swh/provenance/tests/test_split_ranges.py
Changes applied before test
commit 5448b6ee5bc799c73cfe49d67c97768dadfbb8cc
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Dec 1 15:45:12 2021 +0100

    Add support to flatten directories in the isochrone frontiers separately
    
    Building on the previous commit, a new entry point is added to the module
    allowing to iterate over a list of directories that are already identified
    as isochrone frontiers in the provenance model, but no flat models for
    their content has been created yet. This iteration produces such flat
    models.

commit 812df71d99daacb25d1df73522cb754b0842af83
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Dec 1 15:00:28 2021 +0100

    Unify parameter order between provenance and archive objects across the module

commit 765135807ee60342f0b9e62d584c5bd46fedb069
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Dec 1 13:21:33 2021 +0100

    Add explicit flag for flattenned directories to `ProvenanceStorageInterface`
    
    Both contents and directories should always have an associated date in
    the storage. Flattening of a direcory is know explicitly acknowledged
    by setting the newly added flag.

commit dd1d7aa233a69fa87b16a361b12ae255605c6899
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Nov 29 14:42:11 2021 +0100

    Add test for range splitting function

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/521/ for more details.

douardda added inline comments.
swh/provenance/cli.py
152

What does 'process' mean here?

swh/provenance/interface.py
303

not very fond of the True/False/None API here, it's confusing. I think I'd prefer to raise a NotFound kind-of exception. @olasd what do you think?

aeviso added inline comments.
swh/provenance/cli.py
152

Is stands for 'analyze' of 'compute'

swh/provenance/interface.py
303

I rather not use exceptions to represent a possible result. None here simply means that the directory is unknown to the provenance index.

swh/provenance/interface.py
303

the method is named directory_already_flattened, so it is expected to return True or False, and if the directory is not known, then False is the expected result: the directory is unknown from the provenance index thus, no, it's not flattened yet.

This is related to the discussion about the Optional[bool] of the cache in D6712, right?

aeviso added inline comments.
swh/provenance/interface.py
303

No, is the other way around actually. I needed a method to query whether the directory is already flattened or unknown, and then I picked a name. So, the method is not expected to return just True or False (see the documentation below).

Regarding the discussion in D6712, that's related to having consistent information during the processing of a batch of elements.

This revision is now accepted and ready to land.Dec 10 2021, 5:24 PM
aeviso marked an inline comment as done.

rebase

Build is green

Patch application report for D6714 (id=24720)

Could not rebase; Attempt merge onto dd1d7aa233...

Updating dd1d7aa..0f2025f
Fast-forward
 sql/upgrades/002.sql                               |  17 +++
 swh/provenance/api/serializers.py                  |   5 +-
 swh/provenance/api/server.py                       |  44 +++++--
 swh/provenance/cli.py                              |  37 ++++++
 swh/provenance/directory.py                        |  86 +++++++++++++
 swh/provenance/graph.py                            |   4 +-
 swh/provenance/interface.py                        |  54 +++++---
 swh/provenance/mongo/backend.py                    |  48 ++++----
 swh/provenance/origin.py                           |   2 +-
 swh/provenance/postgresql/provenance.py            | 137 ++++++++++++---------
 swh/provenance/provenance.py                       |  72 +++++++++--
 swh/provenance/revision.py                         |  34 +----
 swh/provenance/sql/30-schema.sql                   |  71 +++++------
 swh/provenance/tests/test_cli.py                   |   1 +
 swh/provenance/tests/test_conflict_resolution.py   |  43 ++++---
 swh/provenance/tests/test_directory_flatten.py     |  72 +++++++++++
 swh/provenance/tests/test_directory_iterator.py    |  29 +++++
 swh/provenance/tests/test_history_graph.py         |   2 +-
 swh/provenance/tests/test_isochrone_graph.py       |   2 +-
 swh/provenance/tests/test_provenance_storage.py    |  39 +++---
 .../tests/test_revision_content_layer.py           |   7 +-
 21 files changed, 567 insertions(+), 239 deletions(-)
 create mode 100644 sql/upgrades/002.sql
 create mode 100644 swh/provenance/directory.py
 create mode 100644 swh/provenance/tests/test_directory_flatten.py
 create mode 100644 swh/provenance/tests/test_directory_iterator.py
Changes applied before test
commit 0f2025f6ef454616537103fa720479987cba1278
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Dec 1 15:45:12 2021 +0100

    Add support to flatten directories in the isochrone frontiers separately
    
    Building on the previous commit, a new entry point is added to the module
    allowing to iterate over a list of directories that are already identified
    as isochrone frontiers in the provenance model, but no flat models for
    their content has been created yet. This iteration produces such flat
    models.

commit 052e25da505c77da90d1c54ce0ade775117422e4
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Dec 1 15:00:28 2021 +0100

    Unify parameter order between provenance and archive objects across the module

commit f4f48923e86ef0054642165bcb9ecf4387d70bb8
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Dec 1 13:21:33 2021 +0100

    Add explicit flag for flattenned directories to `ProvenanceStorageInterface`
    
    Both contents and directories should always have an associated date in the storage.
    Flattening of a directory is now explicitly acknowledged by setting the newly added
    flag. The idea is to allow to postpone the creation of flat models for directories
    in the isochrone frontier (the algorithm will be refactored in the commits to come).

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/537/ for more details.