Page MenuHomeSoftware Heritage

Add new flag to skip directory flattening while processing revisions
ClosedPublic

Authored by aeviso on Dec 1 2021, 4:56 PM.

Diff Detail

Event Timeline

Build is green

Patch application report for D6717 (id=24402)

Could not rebase; Attempt merge onto 6306b44896...

Updating 6306b44..4520b1d
Fast-forward
 sql/upgrades/002.sql                               |  17 +++
 swh/provenance/api/serializers.py                  |   5 +-
 swh/provenance/api/server.py                       |  44 +++++--
 swh/provenance/cli.py                              |  40 ++++++
 swh/provenance/directory.py                        |  86 +++++++++++++
 swh/provenance/graph.py                            |   4 +-
 swh/provenance/interface.py                        |  54 +++++---
 swh/provenance/mongo/backend.py                    |  48 ++++----
 swh/provenance/origin.py                           |   2 +-
 swh/provenance/postgresql/provenance.py            | 137 ++++++++++++---------
 swh/provenance/provenance.py                       |  70 +++++++++--
 swh/provenance/revision.py                         |  40 ++----
 swh/provenance/sql/30-schema.sql                   |  71 +++++------
 swh/provenance/tests/test_cli.py                   |   1 +
 swh/provenance/tests/test_conflict_resolution.py   |  43 ++++---
 swh/provenance/tests/test_directory_flatten.py     |  72 +++++++++++
 swh/provenance/tests/test_directory_iterator.py    |  29 +++++
 swh/provenance/tests/test_history_graph.py         |   2 +-
 swh/provenance/tests/test_isochrone_graph.py       |   2 +-
 swh/provenance/tests/test_provenance_storage.py    |  39 +++---
 .../tests/test_revision_content_layer.py           |  49 ++++++--
 swh/provenance/tests/test_routing_keys.py          |  66 ++++++++++
 swh/provenance/tests/test_split_ranges.py          | 137 +++++++++++++++++++++
 23 files changed, 811 insertions(+), 247 deletions(-)
 create mode 100644 sql/upgrades/002.sql
 create mode 100644 swh/provenance/directory.py
 create mode 100644 swh/provenance/tests/test_directory_flatten.py
 create mode 100644 swh/provenance/tests/test_directory_iterator.py
 create mode 100644 swh/provenance/tests/test_routing_keys.py
 create mode 100644 swh/provenance/tests/test_split_ranges.py
Changes applied before test
commit 4520b1d65ca0e8f3377a1253bc0e1378b7072adb
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Dec 1 16:53:46 2021 +0100

    Add new flag to skip directory flattening while processing revisions

commit 7ffb8994ca685fbc76387dee90d75a35b90d9cde
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Dec 1 15:45:12 2021 +0100

    Add support to flatten directories in the isochrone frontiers separately
    
    Building on the previous commit, a new entry point is added to the module
    allowing to iterate over a list of directories that are already identified
    as isochrone frontiers in the provenance model, but no flat models for
    their content has been created yet. This iteration produces such flat
    models.

commit d3345b815ecca205ac425b5e20869938cecf6182
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Dec 1 15:00:28 2021 +0100

    Unify parameter order between provenance and archive objects across the module

commit c0f0fcbb41d331aab8ceb25e6729562501222f78
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Dec 1 13:21:33 2021 +0100

    Add explicit flag for flattenned directories to `ProvenanceStorageInterface`
    
    Both contents and directories should always have an associated date in
    the storage. Flattening of a direcory is know explicitly acknowledged
    by setting the newly added flag.

commit ea4a856c71d00bfd315890a317af38351d5f7ebb
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Nov 29 14:42:11 2021 +0100

    Add test for range splitting function

commit 247574bec84d471a6e78721d02b115b6017d2f9d
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Nov 29 14:41:28 2021 +0100

    Add test for routing key calculation

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/517/ for more details.

aeviso requested review of this revision.Dec 1 2021, 5:02 PM

Build is green

Patch application report for D6717 (id=24419)

Could not rebase; Attempt merge onto 247574bec8...

Updating 247574b..2620572
Fast-forward
 sql/upgrades/002.sql                               |  17 +++
 swh/provenance/api/serializers.py                  |   5 +-
 swh/provenance/api/server.py                       |  44 +++++--
 swh/provenance/cli.py                              |  40 ++++++
 swh/provenance/directory.py                        |  86 +++++++++++++
 swh/provenance/graph.py                            |   4 +-
 swh/provenance/interface.py                        |  54 +++++---
 swh/provenance/mongo/backend.py                    |  48 ++++----
 swh/provenance/origin.py                           |   2 +-
 swh/provenance/postgresql/provenance.py            | 137 ++++++++++++---------
 swh/provenance/provenance.py                       |  70 +++++++++--
 swh/provenance/revision.py                         |  40 ++----
 swh/provenance/sql/30-schema.sql                   |  71 +++++------
 swh/provenance/tests/test_cli.py                   |   1 +
 swh/provenance/tests/test_conflict_resolution.py   |  43 ++++---
 swh/provenance/tests/test_directory_flatten.py     |  72 +++++++++++
 swh/provenance/tests/test_directory_iterator.py    |  29 +++++
 swh/provenance/tests/test_history_graph.py         |   2 +-
 swh/provenance/tests/test_isochrone_graph.py       |   2 +-
 swh/provenance/tests/test_provenance_storage.py    |  39 +++---
 .../tests/test_revision_content_layer.py           |  49 ++++++--
 swh/provenance/tests/test_split_ranges.py          | 137 +++++++++++++++++++++
 22 files changed, 745 insertions(+), 247 deletions(-)
 create mode 100644 sql/upgrades/002.sql
 create mode 100644 swh/provenance/directory.py
 create mode 100644 swh/provenance/tests/test_directory_flatten.py
 create mode 100644 swh/provenance/tests/test_directory_iterator.py
 create mode 100644 swh/provenance/tests/test_split_ranges.py
Changes applied before test
commit 262057262e85c5284ed2f4f03c53a7e9c77fa7a7
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Dec 1 16:53:46 2021 +0100

    Add new flag to skip directory flattening while processing revisions

commit 5448b6ee5bc799c73cfe49d67c97768dadfbb8cc
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Dec 1 15:45:12 2021 +0100

    Add support to flatten directories in the isochrone frontiers separately
    
    Building on the previous commit, a new entry point is added to the module
    allowing to iterate over a list of directories that are already identified
    as isochrone frontiers in the provenance model, but no flat models for
    their content has been created yet. This iteration produces such flat
    models.

commit 812df71d99daacb25d1df73522cb754b0842af83
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Dec 1 15:00:28 2021 +0100

    Unify parameter order between provenance and archive objects across the module

commit 765135807ee60342f0b9e62d584c5bd46fedb069
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Dec 1 13:21:33 2021 +0100

    Add explicit flag for flattenned directories to `ProvenanceStorageInterface`
    
    Both contents and directories should always have an associated date in
    the storage. Flattening of a direcory is know explicitly acknowledged
    by setting the newly added flag.

commit dd1d7aa233a69fa87b16a361b12ae255605c6899
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Nov 29 14:42:11 2021 +0100

    Add test for range splitting function

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/522/ for more details.

olasd added inline comments.
swh/provenance/tests/test_revision_content_layer.py
218

Before doing directory_add, could you check that the directories are indeed not flattened yet?

aeviso added inline comments.
swh/provenance/tests/test_revision_content_layer.py
218

Sure, I'll update the diff in a minute

aeviso marked an inline comment as done.

Assert directories are indeed not flattenned before processing them

Build is green

Patch application report for D6717 (id=24464)

Could not rebase; Attempt merge onto dd1d7aa233...

Updating dd1d7aa..7b4b3f2
Fast-forward
 sql/upgrades/002.sql                               |  17 +++
 swh/provenance/api/serializers.py                  |   5 +-
 swh/provenance/api/server.py                       |  44 +++++--
 swh/provenance/cli.py                              |  40 ++++++
 swh/provenance/directory.py                        |  86 +++++++++++++
 swh/provenance/graph.py                            |   4 +-
 swh/provenance/interface.py                        |  54 +++++---
 swh/provenance/mongo/backend.py                    |  48 ++++----
 swh/provenance/origin.py                           |   2 +-
 swh/provenance/postgresql/provenance.py            | 137 ++++++++++++---------
 swh/provenance/provenance.py                       |  70 +++++++++--
 swh/provenance/revision.py                         |  40 ++----
 swh/provenance/sql/30-schema.sql                   |  71 +++++------
 swh/provenance/tests/test_cli.py                   |   1 +
 swh/provenance/tests/test_conflict_resolution.py   |  43 ++++---
 swh/provenance/tests/test_directory_flatten.py     |  72 +++++++++++
 swh/provenance/tests/test_directory_iterator.py    |  29 +++++
 swh/provenance/tests/test_history_graph.py         |   2 +-
 swh/provenance/tests/test_isochrone_graph.py       |   2 +-
 swh/provenance/tests/test_provenance_storage.py    |  39 +++---
 .../tests/test_revision_content_layer.py           |  51 ++++++--
 21 files changed, 610 insertions(+), 247 deletions(-)
 create mode 100644 sql/upgrades/002.sql
 create mode 100644 swh/provenance/directory.py
 create mode 100644 swh/provenance/tests/test_directory_flatten.py
 create mode 100644 swh/provenance/tests/test_directory_iterator.py
Changes applied before test
commit 7b4b3f24b274b64840ee1f050926a113b860137f
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Dec 1 16:53:46 2021 +0100

    Add new flag to skip directory flattening while processing revisions

commit 5448b6ee5bc799c73cfe49d67c97768dadfbb8cc
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Dec 1 15:45:12 2021 +0100

    Add support to flatten directories in the isochrone frontiers separately
    
    Building on the previous commit, a new entry point is added to the module
    allowing to iterate over a list of directories that are already identified
    as isochrone frontiers in the provenance model, but no flat models for
    their content has been created yet. This iteration produces such flat
    models.

commit 812df71d99daacb25d1df73522cb754b0842af83
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Dec 1 15:00:28 2021 +0100

    Unify parameter order between provenance and archive objects across the module

commit 765135807ee60342f0b9e62d584c5bd46fedb069
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Dec 1 13:21:33 2021 +0100

    Add explicit flag for flattenned directories to `ProvenanceStorageInterface`
    
    Both contents and directories should always have an associated date in
    the storage. Flattening of a direcory is know explicitly acknowledged
    by setting the newly added flag.

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/526/ for more details.

douardda added a subscriber: douardda.
douardda added inline comments.
swh/provenance/cli.py
181

it would be very helpful to add the help argument of these click options.

This revision is now accepted and ready to land.Dec 10 2021, 3:16 PM

add help strings for cli options

Build is green

Patch application report for D6717 (id=24716)

Could not rebase; Attempt merge onto dd1d7aa233...

Updating dd1d7aa..ae5b696
Fast-forward
 sql/upgrades/002.sql                               |  17 +++
 swh/provenance/api/serializers.py                  |   5 +-
 swh/provenance/api/server.py                       |  44 +++++--
 swh/provenance/cli.py                              | 111 +++++++++++++++--
 swh/provenance/directory.py                        |  86 +++++++++++++
 swh/provenance/graph.py                            |   4 +-
 swh/provenance/interface.py                        |  54 +++++---
 swh/provenance/mongo/backend.py                    |  48 ++++----
 swh/provenance/origin.py                           |   2 +-
 swh/provenance/postgresql/provenance.py            | 137 ++++++++++++---------
 swh/provenance/provenance.py                       |  70 +++++++++--
 swh/provenance/revision.py                         |  40 ++----
 swh/provenance/sql/30-schema.sql                   |  71 +++++------
 swh/provenance/tests/test_cli.py                   |   1 +
 swh/provenance/tests/test_conflict_resolution.py   |  43 ++++---
 swh/provenance/tests/test_directory_flatten.py     |  72 +++++++++++
 swh/provenance/tests/test_directory_iterator.py    |  29 +++++
 swh/provenance/tests/test_history_graph.py         |   2 +-
 swh/provenance/tests/test_isochrone_graph.py       |   2 +-
 swh/provenance/tests/test_provenance_storage.py    |  39 +++---
 .../tests/test_revision_content_layer.py           |  51 ++++++--
 21 files changed, 674 insertions(+), 254 deletions(-)
 create mode 100644 sql/upgrades/002.sql
 create mode 100644 swh/provenance/directory.py
 create mode 100644 swh/provenance/tests/test_directory_flatten.py
 create mode 100644 swh/provenance/tests/test_directory_iterator.py
Changes applied before test
commit ae5b6969dfdced39129ba6636d0a3e8879798953
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Dec 1 16:53:46 2021 +0100

    Add new flag to skip directory flattening while processing revisions

commit 5448b6ee5bc799c73cfe49d67c97768dadfbb8cc
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Dec 1 15:45:12 2021 +0100

    Add support to flatten directories in the isochrone frontiers separately
    
    Building on the previous commit, a new entry point is added to the module
    allowing to iterate over a list of directories that are already identified
    as isochrone frontiers in the provenance model, but no flat models for
    their content has been created yet. This iteration produces such flat
    models.

commit 812df71d99daacb25d1df73522cb754b0842af83
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Dec 1 15:00:28 2021 +0100

    Unify parameter order between provenance and archive objects across the module

commit 765135807ee60342f0b9e62d584c5bd46fedb069
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Dec 1 13:21:33 2021 +0100

    Add explicit flag for flattenned directories to `ProvenanceStorageInterface`
    
    Both contents and directories should always have an associated date in
    the storage. Flattening of a direcory is know explicitly acknowledged
    by setting the newly added flag.

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/533/ for more details.

Build is green

Patch application report for D6717 (id=24721)

Could not rebase; Attempt merge onto dd1d7aa233...

Updating dd1d7aa..5a86c23
Fast-forward
 sql/upgrades/002.sql                               |  17 +++
 swh/provenance/api/serializers.py                  |   5 +-
 swh/provenance/api/server.py                       |  44 +++++--
 swh/provenance/cli.py                              | 111 +++++++++++++++--
 swh/provenance/directory.py                        |  86 +++++++++++++
 swh/provenance/graph.py                            |   4 +-
 swh/provenance/interface.py                        |  54 +++++---
 swh/provenance/mongo/backend.py                    |  48 ++++----
 swh/provenance/origin.py                           |   2 +-
 swh/provenance/postgresql/provenance.py            | 137 ++++++++++++---------
 swh/provenance/provenance.py                       |  72 +++++++++--
 swh/provenance/revision.py                         |  40 ++----
 swh/provenance/sql/30-schema.sql                   |  71 +++++------
 swh/provenance/tests/test_cli.py                   |   1 +
 swh/provenance/tests/test_conflict_resolution.py   |  43 ++++---
 swh/provenance/tests/test_directory_flatten.py     |  72 +++++++++++
 swh/provenance/tests/test_directory_iterator.py    |  29 +++++
 swh/provenance/tests/test_history_graph.py         |   2 +-
 swh/provenance/tests/test_isochrone_graph.py       |   2 +-
 swh/provenance/tests/test_provenance_storage.py    |  39 +++---
 .../tests/test_revision_content_layer.py           |  51 ++++++--
 21 files changed, 675 insertions(+), 255 deletions(-)
 create mode 100644 sql/upgrades/002.sql
 create mode 100644 swh/provenance/directory.py
 create mode 100644 swh/provenance/tests/test_directory_flatten.py
 create mode 100644 swh/provenance/tests/test_directory_iterator.py
Changes applied before test
commit 5a86c235de7b8c1b74aed370a600ade36c3412f6
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Dec 1 16:53:46 2021 +0100

    Add new flag to skip directory flattening while processing revisions

commit 0f2025f6ef454616537103fa720479987cba1278
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Dec 1 15:45:12 2021 +0100

    Add support to flatten directories in the isochrone frontiers separately
    
    Building on the previous commit, a new entry point is added to the module
    allowing to iterate over a list of directories that are already identified
    as isochrone frontiers in the provenance model, but no flat models for
    their content has been created yet. This iteration produces such flat
    models.

commit 052e25da505c77da90d1c54ce0ade775117422e4
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Dec 1 15:00:28 2021 +0100

    Unify parameter order between provenance and archive objects across the module

commit f4f48923e86ef0054642165bcb9ecf4387d70bb8
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Dec 1 13:21:33 2021 +0100

    Add explicit flag for flattenned directories to `ProvenanceStorageInterface`
    
    Both contents and directories should always have an associated date in the storage.
    Flattening of a directory is now explicitly acknowledged by setting the newly added
    flag. The idea is to allow to postpone the creation of flat models for directories
    in the isochrone frontier (the algorithm will be refactored in the commits to come).

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/538/ for more details.