Moved isochrone graph logic to its own file graph.py.
Origin-revision layer's algorithm is now in origin.py, while
revision-content layer's logic was moved to revision.py.
Details
Details
- Reviewers
douardda - Group Reviewers
Reviewers - Commits
- rDPROV206399eb8ae7: Reorganize code
Diff Detail
Diff Detail
- Repository
- rDPROV Provenance database
- Branch
- master
- Lint
Lint Skipped - Unit
Unit Tests Skipped - Build Status
Buildable 21893 Build 34044: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 34043: arc lint + arc unit
Event Timeline
Comment Actions
Build is green
Patch application report for D5849 (id=20911)
Could not rebase; Attempt merge onto 6cdd424eba...
Updating 6cdd424..9f02582 Fast-forward swh/provenance/__init__.py | 16 +- swh/provenance/cli.py | 18 +- swh/provenance/graph.py | 214 ++++++++ swh/provenance/model.py | 69 ++- swh/provenance/origin.py | 182 ++++--- swh/provenance/postgresql/provenancedb_base.py | 341 +++--------- .../postgresql/provenancedb_with_path.py | 155 +++--- .../postgresql/provenancedb_without_path.py | 104 ++-- swh/provenance/provenance.py | 593 ++++++--------------- swh/provenance/revision.py | 240 ++++++++- swh/provenance/tests/conftest.py | 6 +- .../tests/data/graphs_cmdbts2_lower_1.yaml | 476 +++++++++++++++++ .../tests/data/graphs_cmdbts2_lower_2.yaml | 476 +++++++++++++++++ .../tests/data/graphs_cmdbts2_upper_1.yaml | 444 +++++++++++++++ .../tests/data/graphs_cmdbts2_upper_2.yaml | 436 +++++++++++++++ .../tests/data/graphs_out-of-order_lower_1.yaml | 223 ++++++++ .../tests/data/synthetic_out-of-order_lower_1.txt | 2 +- swh/provenance/tests/test_conftest.py | 2 +- swh/provenance/tests/test_isochrone_graph.py | 105 ++++ swh/provenance/tests/test_origin_iterator.py | 43 +- swh/provenance/tests/test_provenance_db.py | 18 +- swh/provenance/tests/test_provenance_heuristics.py | 42 +- swh/provenance/tests/test_revision_iterator.py | 6 +- 23 files changed, 3200 insertions(+), 1011 deletions(-) create mode 100644 swh/provenance/graph.py create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_lower_1.yaml create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_lower_2.yaml create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_upper_1.yaml create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_upper_2.yaml create mode 100644 swh/provenance/tests/data/graphs_out-of-order_lower_1.yaml create mode 100644 swh/provenance/tests/test_isochrone_graph.py
Changes applied before test
commit 9f025823ec8196aea264f567d0df584a74edbda2 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Wed Jun 9 16:56:03 2021 +0200 Reorganize code Moved isochrone graph logic to its own file graph.py. Origin-revision layer's algorithm is now in origin.py, while revision-content layer's logic was moved to revision.py. commit f5000961116c3ab720c682155d27e678eaf3ff73 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Wed Jun 9 16:27:51 2021 +0200 Split Provenance backend in two layers First layer (temporarily called `ProvenanceBackend`) is responsable of handling read/write caches and it should ideally be db absnostic (not yet though). Second layer is responsable of all db interaction. In revisions to come it will be further refactored into sevel workers to guarantee no collitions when writing to the DB. commit add73300f8054eeca73f816867a14ae1d8420190 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Wed Jun 9 12:41:54 2021 +0200 Refactor insertion methods in the Provenance backend commit 2a8e113d2407e1d11df7d0d2f4116967c92d7e57 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Wed Jun 9 12:24:31 2021 +0200 Simplify cache usage in the Provenance backend commit a5b7bd73c0ec5fc7cf2b2c7e93c00b40d147ca84 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Wed Jun 9 11:42:15 2021 +0200 Remove `directory_invalidate_in_isochrone_frontier` method from provenance interface It was meant to be used in a multi-thread scenario which is not possible due to Python's lack of actual parallelism. This way the `build_isochrone_graph` function is guaranteed not to modify the DB (it performs only reads now). Also the isochrone graph test was updated to use `revision_add` with a new flag to avoid commits, hence emulating the batch processing behaviour. commit b24bc279c19e346a77d233fa7d24f148f52c5d89 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Jun 8 18:08:25 2021 +0200 Improve out-of-order revision processing Added a flag to the `IsochroneNode` to identify invalidated frontiers and force its update later when processing the graph. This should guarantee the same results when processing revision one-by-one vs. in batches (in terms of db rows). commit 1146a9b9203557195da47df2b76ba1603aa4ca31 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Jun 8 16:20:21 2021 +0200 Refine maxdate calculation commit 18063809ccc0b4f7cbfcf00fc95b26ba297c99ab Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Jun 8 16:17:49 2021 +0200 Fix issue when processing revision in batch If any revision in the batch was invalidating a frontier, the commit of the complete batch failed. This is now fixed. commit 52de7a0c11057ec80743807350f4a625efab11ba Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Jun 8 11:08:10 2021 +0200 Add isochrone graph tests for the remaining heuristics commit a5e8234b9f43ce02144ff9ff37a2caa00ebf608a Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Jun 7 17:09:43 2021 +0200 Add test for isochrone graph topology The expected isochrone graphs for each revision in the test should be provided as a dictionary in an associated yaml file. Currently only heuristic lower with depth=1 is being tested. Also, model clases DirectoryEntry, FileEntry and IsochroneNode were modified so that they can be compared by equlity and hashed. commit 59c0f1bf49617824feae7ad08ce1b5f46b7a70cd Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Jun 7 11:45:25 2021 +0200 Add equality check functions to model classes commit 4ebab8d2ce933637c85bf456a796b6da8d12b513 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Jun 4 15:01:38 2021 +0200 Refactor OriginEntry to include info about visit date and snapshot Revisions reachable from an OriginEntry are now queried separately and returned in an iterable. Also `origin_add` function was updated accordingly, and CLI command now uses a CSVOriginIterator similar to that previously developed for revisions. Updated tests as well to ensure nothing was broken during the refactoring. commit 6ea9313800b86e996783f0bf5e37cc8c34f3627e Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Jun 4 14:54:56 2021 +0200 Remove archive parameter from RevisionEntry
See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/112/ for more details.
Comment Actions
Build is green
Patch application report for D5849 (id=20942)
Could not rebase; Attempt merge onto 075b0d6cd6...
Updating 075b0d6..c7d1840 Fast-forward swh/provenance/__init__.py | 16 +- swh/provenance/cli.py | 18 +- swh/provenance/graph.py | 214 ++++++++ swh/provenance/model.py | 76 ++- swh/provenance/origin.py | 184 ++++--- swh/provenance/postgresql/provenancedb_base.py | 352 ++++-------- .../postgresql/provenancedb_with_path.py | 155 +++--- .../postgresql/provenancedb_without_path.py | 104 ++-- swh/provenance/provenance.py | 593 ++++++--------------- swh/provenance/revision.py | 237 +++++++- swh/provenance/tests/conftest.py | 6 +- .../tests/data/graphs_cmdbts2_lower_1.yaml | 476 +++++++++++++++++ .../tests/data/graphs_cmdbts2_lower_2.yaml | 476 +++++++++++++++++ .../tests/data/graphs_cmdbts2_upper_1.yaml | 444 +++++++++++++++ .../tests/data/graphs_cmdbts2_upper_2.yaml | 436 +++++++++++++++ .../tests/data/graphs_out-of-order_lower_1.yaml | 223 ++++++++ .../tests/data/synthetic_out-of-order_lower_1.txt | 6 +- swh/provenance/tests/test_conftest.py | 2 +- swh/provenance/tests/test_isochrone_graph.py | 107 ++++ swh/provenance/tests/test_origin_iterator.py | 43 +- swh/provenance/tests/test_provenance_db.py | 16 +- swh/provenance/tests/test_provenance_heuristics.py | 42 +- swh/provenance/tests/test_revision_iterator.py | 4 +- 23 files changed, 3220 insertions(+), 1010 deletions(-) create mode 100644 swh/provenance/graph.py create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_lower_1.yaml create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_lower_2.yaml create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_upper_1.yaml create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_upper_2.yaml create mode 100644 swh/provenance/tests/data/graphs_out-of-order_lower_1.yaml create mode 100644 swh/provenance/tests/test_isochrone_graph.py
Changes applied before test
commit c7d184075a3ac13310cf2823ca580fce9457d7e1 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Thu Jun 10 21:56:57 2021 +0200 Reorganize code Moved isochrone graph logic to its own file graph.py. Origin-revision layer's algorithm is now in origin.py, while revision-content layer's logic was moved to revision.py. commit 6a8d34145b7b113d8ca62cf134d50ab69c491ec7 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Wed Jun 9 16:27:51 2021 +0200 Split Provenance backend in two layers First layer (temporarily called `ProvenanceBackend`) is responsable of handling read/write caches and it should ideally be db absnostic (not yet though). Second layer is responsable of all db interaction. In revisions to come it will be further refactored into sevel workers to guarantee no collitions when writing to the DB. commit 4a8964d25ff8490b8bf33d8480f6db1b97a0af22 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Wed Jun 9 12:41:54 2021 +0200 Refactor insertion methods in the Provenance backend commit 4296febd8fbe3b0c8dc5a3650cbbd4ecf29713cf Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Wed Jun 9 12:24:31 2021 +0200 Simplify cache usage in the Provenance backend commit af41748ef54dedf87f8304bb457b028b2de6369f Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Wed Jun 9 11:42:15 2021 +0200 Remove `directory_invalidate_in_isochrone_frontier` method from provenance interface It was meant to be used in a multi-thread scenario which is not possible due to Python's lack of actual parallelism. This way the `build_isochrone_graph` function is guaranteed not to modify the DB (it performs only reads now). Also the isochrone graph test was updated to use `revision_add` with a new flag to avoid commits, hence emulating the batch processing behaviour. commit c20aeb432e831e412c13033c4e7a3d0ee6553e82 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Jun 8 18:08:25 2021 +0200 Improve out-of-order revision processing Added a flag to the `IsochroneNode` to identify invalidated frontiers and force its update later when processing the graph. This should guarantee the same results when processing revision one-by-one vs. in batches (in terms of db rows). commit 65226455d522f5156ed8d7e37d2b7546d0d010f1 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Jun 8 16:20:21 2021 +0200 Refine maxdate calculation commit d4ab6857f6a74e181316bf90db008b51d4b81085 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Jun 8 16:17:49 2021 +0200 Fix issue when processing revision in batch If any revision in the batch was invalidating a frontier, the commit of the complete batch failed. This is now fixed. commit d14247403019bd34e1e430c71e074574c89e3e57 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Jun 8 11:08:10 2021 +0200 Add isochrone graph tests for the remaining heuristics commit 594e5a83b38ceb99a46520e9d835b14074caed70 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Jun 7 17:09:43 2021 +0200 Add test for isochrone graph topology The expected isochrone graphs for each revision in the test should be provided as a dictionary in an associated yaml file. Currently only heuristic lower with depth=1 is being tested. Also, model clases DirectoryEntry, FileEntry and IsochroneNode were modified so that they can be compared by equlity and hashed. commit 244b08b4b51c8f0891301e4495f05ba8368e156c Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Jun 7 11:45:25 2021 +0200 Add equality check functions to model classes commit 5a9fb987c9aa169095185b1559a87bce536776b7 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Jun 4 15:01:38 2021 +0200 Refactor OriginEntry to include info about visit date and snapshot Revisions reachable from an OriginEntry are now queried separately and returned in an iterable. Also `origin_add` function was updated accordingly, and CLI command now uses a CSVOriginIterator similar to that previously developed for revisions. Updated tests as well to ensure nothing was broken during the refactoring. commit fa4942ddff353c4d1d46c7f61ec570c9a28bc648 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Jun 4 14:54:56 2021 +0200 Remove archive parameter from RevisionEntry
See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/121/ for more details.
Comment Actions
Build is green
Patch application report for D5849 (id=20972)
Could not rebase; Attempt merge onto 075b0d6cd6...
Updating 075b0d6..206399e Fast-forward swh/provenance/__init__.py | 16 +- swh/provenance/cli.py | 18 +- swh/provenance/graph.py | 223 ++++++++ swh/provenance/model.py | 76 ++- swh/provenance/origin.py | 183 ++++--- swh/provenance/postgresql/provenancedb_base.py | 352 ++++-------- .../postgresql/provenancedb_with_path.py | 155 +++--- .../postgresql/provenancedb_without_path.py | 104 ++-- swh/provenance/provenance.py | 593 ++++++--------------- swh/provenance/revision.py | 237 +++++++- swh/provenance/tests/conftest.py | 6 +- .../tests/data/graphs_cmdbts2_lower_1.yaml | 401 ++++++++++++++ .../tests/data/graphs_cmdbts2_lower_2.yaml | 401 ++++++++++++++ .../tests/data/graphs_cmdbts2_upper_1.yaml | 371 +++++++++++++ .../tests/data/graphs_cmdbts2_upper_2.yaml | 365 +++++++++++++ .../tests/data/graphs_out-of-order_lower_1.yaml | 185 +++++++ .../tests/data/synthetic_out-of-order_lower_1.txt | 6 +- swh/provenance/tests/test_conftest.py | 2 +- swh/provenance/tests/test_isochrone_graph.py | 101 ++++ swh/provenance/tests/test_origin_iterator.py | 43 +- swh/provenance/tests/test_provenance_db.py | 16 +- swh/provenance/tests/test_provenance_heuristics.py | 51 +- swh/provenance/tests/test_revision_iterator.py | 4 +- 23 files changed, 2895 insertions(+), 1014 deletions(-) create mode 100644 swh/provenance/graph.py create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_lower_1.yaml create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_lower_2.yaml create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_upper_1.yaml create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_upper_2.yaml create mode 100644 swh/provenance/tests/data/graphs_out-of-order_lower_1.yaml create mode 100644 swh/provenance/tests/test_isochrone_graph.py
Changes applied before test
commit 206399eb8ae79e350c6c47af50589fec953d7d98 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Jun 11 18:01:14 2021 +0200 Reorganize code Moved isochrone graph logic to its own file graph.py. Origin-revision layer's algorithm is now in origin.py, while revision-content layer's logic was moved to revision.py. commit c4b1f31640b1263e8afb7c4c71a8ca3d984b3fd2 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Jun 11 17:52:00 2021 +0200 Split Provenance backend in two layers First layer (temporarily called `ProvenanceBackend`) is responsable of handling read/write caches and it should ideally be db absnostic (not yet though). Second layer is responsable of all db interaction. In revisions to come it will be further refactored into sevel workers to guarantee no collitions when writing to the DB. commit f1a9fe8182a3a6a8a47d6093197ee6b800fce95b Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Wed Jun 9 12:41:54 2021 +0200 Refactor insertion methods in the Provenance backend commit 3f99025d6d45287ba7ce97db39eef3f9c5acb78c Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Wed Jun 9 12:24:31 2021 +0200 Simplify cache usage in the Provenance backend commit d1b476b27ac4e7f355468a0514f6a9850dbf1143 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Jun 8 16:17:49 2021 +0200 Improve out-of-order revision processing Fix issue when processing revision in batch If any revision in the batch was invalidating a frontier, the commit of the complete batch failed. This is now fixed. Refine maxdate calculation Added a flag to the IsochroneNode to identify invalidated frontiers and force its update later when processing the graph. This should guarantee the same results when processing revision one-by-one vs. in batches (in terms of db rows). Remove directory_invalidate_in_isochrone_frontier method from provenance interface It was meant to be used in a multi-thread scenario which is not possible due to Python's lack of actual parallelism. This way the build_isochrone_graph function is guaranteed not to modify the DB (it performs only reads now). Also the isochrone graph test was updated to use revision_add with a new flag to avoid commits, hence emulating the batch processing behaviour. commit 30bff867e97f37849d960fdc284513844fae2a34 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Jun 8 11:08:10 2021 +0200 Add isochrone graph tests for the remaining heuristics commit c2843ae5ba47bfb03d0fa10ce45ad274061097df Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Jun 7 17:09:43 2021 +0200 Add test for isochrone graph topology The expected isochrone graphs for each revision in the test should be provided as a dictionary in an associated yaml file. Currently only heuristic lower with depth=1 is being tested. Also, model clases DirectoryEntry, FileEntry and IsochroneNode were modified so that they can be compared by equlity and hashed. commit 1dd14205ba60d02e14f2c352113871c1025b8e7f Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Jun 7 11:45:25 2021 +0200 Add equality check functions to model classes commit 9aaaedb3ebc981555276e99616a0c4fc837b78e9 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Jun 4 15:01:38 2021 +0200 Refactor OriginEntry to include info about visit date and snapshot Revisions reachable from an OriginEntry are now queried separately and returned in an iterable. Also `origin_add` function was updated accordingly, and CLI command now uses a CSVOriginIterator similar to that previously developed for revisions. Updated tests as well to ensure nothing was broken during the refactoring. commit fa4942ddff353c4d1d46c7f61ec570c9a28bc648 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Jun 4 14:54:56 2021 +0200 Remove archive parameter from RevisionEntry
See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/128/ for more details.