Page MenuHomeSoftware Heritage

Reorganize code
ClosedPublic

Authored by aeviso on Jun 10 2021, 12:47 PM.

Details

Reviewers
douardda
Group Reviewers
Reviewers
Commits
rDPROV206399eb8ae7: Reorganize code
Summary

Moved isochrone graph logic to its own file graph.py.
Origin-revision layer's algorithm is now in origin.py, while
revision-content layer's logic was moved to revision.py.

Diff Detail

Repository
rDPROV Provenance database
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D5849 (id=20911)

Could not rebase; Attempt merge onto 6cdd424eba...

Updating 6cdd424..9f02582
Fast-forward
 swh/provenance/__init__.py                         |  16 +-
 swh/provenance/cli.py                              |  18 +-
 swh/provenance/graph.py                            | 214 ++++++++
 swh/provenance/model.py                            |  69 ++-
 swh/provenance/origin.py                           | 182 ++++---
 swh/provenance/postgresql/provenancedb_base.py     | 341 +++---------
 .../postgresql/provenancedb_with_path.py           | 155 +++---
 .../postgresql/provenancedb_without_path.py        | 104 ++--
 swh/provenance/provenance.py                       | 593 ++++++---------------
 swh/provenance/revision.py                         | 240 ++++++++-
 swh/provenance/tests/conftest.py                   |   6 +-
 .../tests/data/graphs_cmdbts2_lower_1.yaml         | 476 +++++++++++++++++
 .../tests/data/graphs_cmdbts2_lower_2.yaml         | 476 +++++++++++++++++
 .../tests/data/graphs_cmdbts2_upper_1.yaml         | 444 +++++++++++++++
 .../tests/data/graphs_cmdbts2_upper_2.yaml         | 436 +++++++++++++++
 .../tests/data/graphs_out-of-order_lower_1.yaml    | 223 ++++++++
 .../tests/data/synthetic_out-of-order_lower_1.txt  |   2 +-
 swh/provenance/tests/test_conftest.py              |   2 +-
 swh/provenance/tests/test_isochrone_graph.py       | 105 ++++
 swh/provenance/tests/test_origin_iterator.py       |  43 +-
 swh/provenance/tests/test_provenance_db.py         |  18 +-
 swh/provenance/tests/test_provenance_heuristics.py |  42 +-
 swh/provenance/tests/test_revision_iterator.py     |   6 +-
 23 files changed, 3200 insertions(+), 1011 deletions(-)
 create mode 100644 swh/provenance/graph.py
 create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_lower_1.yaml
 create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_lower_2.yaml
 create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_upper_1.yaml
 create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_upper_2.yaml
 create mode 100644 swh/provenance/tests/data/graphs_out-of-order_lower_1.yaml
 create mode 100644 swh/provenance/tests/test_isochrone_graph.py
Changes applied before test
commit 9f025823ec8196aea264f567d0df584a74edbda2
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 9 16:56:03 2021 +0200

    Reorganize code
    
    Moved isochrone graph logic to its own file graph.py.
    Origin-revision layer's algorithm is now in origin.py, while
    revision-content layer's logic was moved to revision.py.

commit f5000961116c3ab720c682155d27e678eaf3ff73
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 9 16:27:51 2021 +0200

    Split Provenance backend in two layers
    
    First layer (temporarily called `ProvenanceBackend`) is responsable of
    handling read/write caches and it should ideally be db absnostic (not
    yet though).
    Second layer is responsable of all db interaction. In revisions to come
    it will be further refactored into sevel workers to guarantee no
    collitions when writing to the DB.

commit add73300f8054eeca73f816867a14ae1d8420190
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 9 12:41:54 2021 +0200

    Refactor insertion methods in the Provenance backend

commit 2a8e113d2407e1d11df7d0d2f4116967c92d7e57
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 9 12:24:31 2021 +0200

    Simplify cache usage in the Provenance backend

commit a5b7bd73c0ec5fc7cf2b2c7e93c00b40d147ca84
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 9 11:42:15 2021 +0200

    Remove `directory_invalidate_in_isochrone_frontier` method from provenance interface
    
    It was meant to be used in a multi-thread scenario which is not possible
    due to Python's lack of actual parallelism. This way the
    `build_isochrone_graph` function is guaranteed not to modify the DB (it
    performs only reads now). Also the isochrone graph test was updated to
    use `revision_add` with a new flag to avoid commits, hence emulating the
    batch processing behaviour.

commit b24bc279c19e346a77d233fa7d24f148f52c5d89
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Tue Jun 8 18:08:25 2021 +0200

    Improve out-of-order revision processing
    
    Added a flag to the `IsochroneNode` to identify invalidated frontiers
    and force its update later when processing the graph. This should
    guarantee the same results when processing revision one-by-one vs. in
    batches (in terms of db rows).

commit 1146a9b9203557195da47df2b76ba1603aa4ca31
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Tue Jun 8 16:20:21 2021 +0200

    Refine maxdate calculation

commit 18063809ccc0b4f7cbfcf00fc95b26ba297c99ab
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Tue Jun 8 16:17:49 2021 +0200

    Fix issue when processing revision in batch
    
    If any revision in the batch was invalidating a frontier, the commit of
    the complete batch failed. This is now fixed.

commit 52de7a0c11057ec80743807350f4a625efab11ba
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Tue Jun 8 11:08:10 2021 +0200

    Add isochrone graph tests for the remaining heuristics

commit a5e8234b9f43ce02144ff9ff37a2caa00ebf608a
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 7 17:09:43 2021 +0200

    Add test for isochrone graph topology
    
    The expected isochrone graphs for each revision in the test should be
    provided as a dictionary in an associated yaml file.
    Currently only heuristic lower with depth=1 is being tested.
    
    Also, model clases DirectoryEntry, FileEntry and IsochroneNode were
    modified so that they can be compared by equlity and hashed.

commit 59c0f1bf49617824feae7ad08ce1b5f46b7a70cd
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 7 11:45:25 2021 +0200

    Add equality check functions to model classes

commit 4ebab8d2ce933637c85bf456a796b6da8d12b513
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Fri Jun 4 15:01:38 2021 +0200

    Refactor OriginEntry to include info about visit date and snapshot
    
    Revisions reachable from an OriginEntry are now queried separately and returned in an iterable.
    Also `origin_add` function was updated accordingly, and CLI command now uses a CSVOriginIterator
    similar to that previously developed for revisions. Updated tests as well to ensure nothing was
    broken during the refactoring.

commit 6ea9313800b86e996783f0bf5e37cc8c34f3627e
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Fri Jun 4 14:54:56 2021 +0200

    Remove archive parameter from RevisionEntry

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/112/ for more details.

Build is green

Patch application report for D5849 (id=20942)

Could not rebase; Attempt merge onto 075b0d6cd6...

Updating 075b0d6..c7d1840
Fast-forward
 swh/provenance/__init__.py                         |  16 +-
 swh/provenance/cli.py                              |  18 +-
 swh/provenance/graph.py                            | 214 ++++++++
 swh/provenance/model.py                            |  76 ++-
 swh/provenance/origin.py                           | 184 ++++---
 swh/provenance/postgresql/provenancedb_base.py     | 352 ++++--------
 .../postgresql/provenancedb_with_path.py           | 155 +++---
 .../postgresql/provenancedb_without_path.py        | 104 ++--
 swh/provenance/provenance.py                       | 593 ++++++---------------
 swh/provenance/revision.py                         | 237 +++++++-
 swh/provenance/tests/conftest.py                   |   6 +-
 .../tests/data/graphs_cmdbts2_lower_1.yaml         | 476 +++++++++++++++++
 .../tests/data/graphs_cmdbts2_lower_2.yaml         | 476 +++++++++++++++++
 .../tests/data/graphs_cmdbts2_upper_1.yaml         | 444 +++++++++++++++
 .../tests/data/graphs_cmdbts2_upper_2.yaml         | 436 +++++++++++++++
 .../tests/data/graphs_out-of-order_lower_1.yaml    | 223 ++++++++
 .../tests/data/synthetic_out-of-order_lower_1.txt  |   6 +-
 swh/provenance/tests/test_conftest.py              |   2 +-
 swh/provenance/tests/test_isochrone_graph.py       | 107 ++++
 swh/provenance/tests/test_origin_iterator.py       |  43 +-
 swh/provenance/tests/test_provenance_db.py         |  16 +-
 swh/provenance/tests/test_provenance_heuristics.py |  42 +-
 swh/provenance/tests/test_revision_iterator.py     |   4 +-
 23 files changed, 3220 insertions(+), 1010 deletions(-)
 create mode 100644 swh/provenance/graph.py
 create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_lower_1.yaml
 create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_lower_2.yaml
 create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_upper_1.yaml
 create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_upper_2.yaml
 create mode 100644 swh/provenance/tests/data/graphs_out-of-order_lower_1.yaml
 create mode 100644 swh/provenance/tests/test_isochrone_graph.py
Changes applied before test
commit c7d184075a3ac13310cf2823ca580fce9457d7e1
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jun 10 21:56:57 2021 +0200

    Reorganize code
    
    Moved isochrone graph logic to its own file graph.py. Origin-revision layer's algorithm
    is now in origin.py, while revision-content layer's logic was moved to revision.py.

commit 6a8d34145b7b113d8ca62cf134d50ab69c491ec7
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 9 16:27:51 2021 +0200

    Split Provenance backend in two layers
    
    First layer (temporarily called `ProvenanceBackend`) is responsable of
    handling read/write caches and it should ideally be db absnostic (not
    yet though).
    Second layer is responsable of all db interaction. In revisions to come
    it will be further refactored into sevel workers to guarantee no
    collitions when writing to the DB.

commit 4a8964d25ff8490b8bf33d8480f6db1b97a0af22
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 9 12:41:54 2021 +0200

    Refactor insertion methods in the Provenance backend

commit 4296febd8fbe3b0c8dc5a3650cbbd4ecf29713cf
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 9 12:24:31 2021 +0200

    Simplify cache usage in the Provenance backend

commit af41748ef54dedf87f8304bb457b028b2de6369f
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 9 11:42:15 2021 +0200

    Remove `directory_invalidate_in_isochrone_frontier` method from provenance interface
    
    It was meant to be used in a multi-thread scenario which is not possible
    due to Python's lack of actual parallelism. This way the
    `build_isochrone_graph` function is guaranteed not to modify the DB (it
    performs only reads now). Also the isochrone graph test was updated to
    use `revision_add` with a new flag to avoid commits, hence emulating the
    batch processing behaviour.

commit c20aeb432e831e412c13033c4e7a3d0ee6553e82
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Tue Jun 8 18:08:25 2021 +0200

    Improve out-of-order revision processing
    
    Added a flag to the `IsochroneNode` to identify invalidated frontiers
    and force its update later when processing the graph. This should
    guarantee the same results when processing revision one-by-one vs. in
    batches (in terms of db rows).

commit 65226455d522f5156ed8d7e37d2b7546d0d010f1
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Tue Jun 8 16:20:21 2021 +0200

    Refine maxdate calculation

commit d4ab6857f6a74e181316bf90db008b51d4b81085
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Tue Jun 8 16:17:49 2021 +0200

    Fix issue when processing revision in batch
    
    If any revision in the batch was invalidating a frontier, the commit of
    the complete batch failed. This is now fixed.

commit d14247403019bd34e1e430c71e074574c89e3e57
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Tue Jun 8 11:08:10 2021 +0200

    Add isochrone graph tests for the remaining heuristics

commit 594e5a83b38ceb99a46520e9d835b14074caed70
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 7 17:09:43 2021 +0200

    Add test for isochrone graph topology
    
    The expected isochrone graphs for each revision in the test should be
    provided as a dictionary in an associated yaml file.
    Currently only heuristic lower with depth=1 is being tested.
    
    Also, model clases DirectoryEntry, FileEntry and IsochroneNode were
    modified so that they can be compared by equlity and hashed.

commit 244b08b4b51c8f0891301e4495f05ba8368e156c
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 7 11:45:25 2021 +0200

    Add equality check functions to model classes

commit 5a9fb987c9aa169095185b1559a87bce536776b7
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Fri Jun 4 15:01:38 2021 +0200

    Refactor OriginEntry to include info about visit date and snapshot
    
    Revisions reachable from an OriginEntry are now queried separately and returned in an iterable.
    Also `origin_add` function was updated accordingly, and CLI command now uses a CSVOriginIterator
    similar to that previously developed for revisions. Updated tests as well to ensure nothing was
    broken during the refactoring.

commit fa4942ddff353c4d1d46c7f61ec570c9a28bc648
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Fri Jun 4 14:54:56 2021 +0200

    Remove archive parameter from RevisionEntry

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/121/ for more details.

This revision is now accepted and ready to land.Jun 11 2021, 12:39 PM

Build is green

Patch application report for D5849 (id=20972)

Could not rebase; Attempt merge onto 075b0d6cd6...

Updating 075b0d6..206399e
Fast-forward
 swh/provenance/__init__.py                         |  16 +-
 swh/provenance/cli.py                              |  18 +-
 swh/provenance/graph.py                            | 223 ++++++++
 swh/provenance/model.py                            |  76 ++-
 swh/provenance/origin.py                           | 183 ++++---
 swh/provenance/postgresql/provenancedb_base.py     | 352 ++++--------
 .../postgresql/provenancedb_with_path.py           | 155 +++---
 .../postgresql/provenancedb_without_path.py        | 104 ++--
 swh/provenance/provenance.py                       | 593 ++++++---------------
 swh/provenance/revision.py                         | 237 +++++++-
 swh/provenance/tests/conftest.py                   |   6 +-
 .../tests/data/graphs_cmdbts2_lower_1.yaml         | 401 ++++++++++++++
 .../tests/data/graphs_cmdbts2_lower_2.yaml         | 401 ++++++++++++++
 .../tests/data/graphs_cmdbts2_upper_1.yaml         | 371 +++++++++++++
 .../tests/data/graphs_cmdbts2_upper_2.yaml         | 365 +++++++++++++
 .../tests/data/graphs_out-of-order_lower_1.yaml    | 185 +++++++
 .../tests/data/synthetic_out-of-order_lower_1.txt  |   6 +-
 swh/provenance/tests/test_conftest.py              |   2 +-
 swh/provenance/tests/test_isochrone_graph.py       | 101 ++++
 swh/provenance/tests/test_origin_iterator.py       |  43 +-
 swh/provenance/tests/test_provenance_db.py         |  16 +-
 swh/provenance/tests/test_provenance_heuristics.py |  51 +-
 swh/provenance/tests/test_revision_iterator.py     |   4 +-
 23 files changed, 2895 insertions(+), 1014 deletions(-)
 create mode 100644 swh/provenance/graph.py
 create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_lower_1.yaml
 create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_lower_2.yaml
 create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_upper_1.yaml
 create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_upper_2.yaml
 create mode 100644 swh/provenance/tests/data/graphs_out-of-order_lower_1.yaml
 create mode 100644 swh/provenance/tests/test_isochrone_graph.py
Changes applied before test
commit 206399eb8ae79e350c6c47af50589fec953d7d98
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Fri Jun 11 18:01:14 2021 +0200

    Reorganize code
    
    Moved isochrone graph logic to its own file graph.py. Origin-revision layer's algorithm
    is now in origin.py, while revision-content layer's logic was moved to revision.py.

commit c4b1f31640b1263e8afb7c4c71a8ca3d984b3fd2
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Fri Jun 11 17:52:00 2021 +0200

    Split Provenance backend in two layers
    
    First layer (temporarily called `ProvenanceBackend`) is responsable of
    handling read/write caches and it should ideally be db absnostic (not
    yet though).
    Second layer is responsable of all db interaction. In revisions to come
    it will be further refactored into sevel workers to guarantee no
    collitions when writing to the DB.

commit f1a9fe8182a3a6a8a47d6093197ee6b800fce95b
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 9 12:41:54 2021 +0200

    Refactor insertion methods in the Provenance backend

commit 3f99025d6d45287ba7ce97db39eef3f9c5acb78c
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 9 12:24:31 2021 +0200

    Simplify cache usage in the Provenance backend

commit d1b476b27ac4e7f355468a0514f6a9850dbf1143
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Tue Jun 8 16:17:49 2021 +0200

    Improve out-of-order revision processing
    
    Fix issue when processing revision in batch
    
    If any revision in the batch was invalidating a frontier, the commit of
    the complete batch failed. This is now fixed.
    
    Refine maxdate calculation
    
    Added a flag to the IsochroneNode to identify invalidated frontiers
    and force its update later when processing the graph. This should
    guarantee the same results when processing revision one-by-one vs. in
    batches (in terms of db rows).
    
    Remove directory_invalidate_in_isochrone_frontier method from provenance interface
    
    It was meant to be used in a multi-thread scenario which is not possible
    due to Python's lack of actual parallelism. This way the
    build_isochrone_graph function is guaranteed not to modify the DB (it
    performs only reads now). Also the isochrone graph test was updated to
    use revision_add with a new flag to avoid commits, hence emulating the
    batch processing behaviour.

commit 30bff867e97f37849d960fdc284513844fae2a34
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Tue Jun 8 11:08:10 2021 +0200

    Add isochrone graph tests for the remaining heuristics

commit c2843ae5ba47bfb03d0fa10ce45ad274061097df
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 7 17:09:43 2021 +0200

    Add test for isochrone graph topology
    
    The expected isochrone graphs for each revision in the test should be
    provided as a dictionary in an associated yaml file.
    Currently only heuristic lower with depth=1 is being tested.
    
    Also, model clases DirectoryEntry, FileEntry and IsochroneNode were
    modified so that they can be compared by equlity and hashed.

commit 1dd14205ba60d02e14f2c352113871c1025b8e7f
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 7 11:45:25 2021 +0200

    Add equality check functions to model classes

commit 9aaaedb3ebc981555276e99616a0c4fc837b78e9
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Fri Jun 4 15:01:38 2021 +0200

    Refactor OriginEntry to include info about visit date and snapshot
    
    Revisions reachable from an OriginEntry are now queried separately and returned in an iterable.
    Also `origin_add` function was updated accordingly, and CLI command now uses a CSVOriginIterator
    similar to that previously developed for revisions. Updated tests as well to ensure nothing was
    broken during the refactoring.

commit fa4942ddff353c4d1d46c7f61ec570c9a28bc648
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Fri Jun 4 14:54:56 2021 +0200

    Remove archive parameter from RevisionEntry

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/128/ for more details.

This revision was automatically updated to reflect the committed changes.