Moved isochrone graph logic to its own file graph.py.
Origin-revision layer's algorithm is now in origin.py, while
revision-content layer's logic was moved to revision.py.
Details
Details
- Reviewers
douardda - Group Reviewers
Reviewers - Commits
- rDPROV206399eb8ae7: Reorganize code
Diff Detail
Diff Detail
- Repository
- rDPROV Provenance database
- Branch
- master
- Lint
Lint Skipped - Unit
Unit Tests Skipped - Build Status
Buildable 21952 Build 34146: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 34145: arc lint + arc unit
Event Timeline
Comment Actions
Build is green
Patch application report for D5849 (id=20911)
Could not rebase; Attempt merge onto 6cdd424eba...
Updating 6cdd424..9f02582 Fast-forward swh/provenance/__init__.py | 16 +- swh/provenance/cli.py | 18 +- swh/provenance/graph.py | 214 ++++++++ swh/provenance/model.py | 69 ++- swh/provenance/origin.py | 182 ++++--- swh/provenance/postgresql/provenancedb_base.py | 341 +++--------- .../postgresql/provenancedb_with_path.py | 155 +++--- .../postgresql/provenancedb_without_path.py | 104 ++-- swh/provenance/provenance.py | 593 ++++++--------------- swh/provenance/revision.py | 240 ++++++++- swh/provenance/tests/conftest.py | 6 +- .../tests/data/graphs_cmdbts2_lower_1.yaml | 476 +++++++++++++++++ .../tests/data/graphs_cmdbts2_lower_2.yaml | 476 +++++++++++++++++ .../tests/data/graphs_cmdbts2_upper_1.yaml | 444 +++++++++++++++ .../tests/data/graphs_cmdbts2_upper_2.yaml | 436 +++++++++++++++ .../tests/data/graphs_out-of-order_lower_1.yaml | 223 ++++++++ .../tests/data/synthetic_out-of-order_lower_1.txt | 2 +- swh/provenance/tests/test_conftest.py | 2 +- swh/provenance/tests/test_isochrone_graph.py | 105 ++++ swh/provenance/tests/test_origin_iterator.py | 43 +- swh/provenance/tests/test_provenance_db.py | 18 +- swh/provenance/tests/test_provenance_heuristics.py | 42 +- swh/provenance/tests/test_revision_iterator.py | 6 +- 23 files changed, 3200 insertions(+), 1011 deletions(-) create mode 100644 swh/provenance/graph.py create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_lower_1.yaml create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_lower_2.yaml create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_upper_1.yaml create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_upper_2.yaml create mode 100644 swh/provenance/tests/data/graphs_out-of-order_lower_1.yaml create mode 100644 swh/provenance/tests/test_isochrone_graph.py
Changes applied before test
commit 9f025823ec8196aea264f567d0df584a74edbda2
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date: Wed Jun 9 16:56:03 2021 +0200
Reorganize code
Moved isochrone graph logic to its own file graph.py.
Origin-revision layer's algorithm is now in origin.py, while
revision-content layer's logic was moved to revision.py.
commit f5000961116c3ab720c682155d27e678eaf3ff73
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date: Wed Jun 9 16:27:51 2021 +0200
Split Provenance backend in two layers
First layer (temporarily called `ProvenanceBackend`) is responsable of
handling read/write caches and it should ideally be db absnostic (not
yet though).
Second layer is responsable of all db interaction. In revisions to come
it will be further refactored into sevel workers to guarantee no
collitions when writing to the DB.
commit add73300f8054eeca73f816867a14ae1d8420190
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date: Wed Jun 9 12:41:54 2021 +0200
Refactor insertion methods in the Provenance backend
commit 2a8e113d2407e1d11df7d0d2f4116967c92d7e57
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date: Wed Jun 9 12:24:31 2021 +0200
Simplify cache usage in the Provenance backend
commit a5b7bd73c0ec5fc7cf2b2c7e93c00b40d147ca84
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date: Wed Jun 9 11:42:15 2021 +0200
Remove `directory_invalidate_in_isochrone_frontier` method from provenance interface
It was meant to be used in a multi-thread scenario which is not possible
due to Python's lack of actual parallelism. This way the
`build_isochrone_graph` function is guaranteed not to modify the DB (it
performs only reads now). Also the isochrone graph test was updated to
use `revision_add` with a new flag to avoid commits, hence emulating the
batch processing behaviour.
commit b24bc279c19e346a77d233fa7d24f148f52c5d89
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date: Tue Jun 8 18:08:25 2021 +0200
Improve out-of-order revision processing
Added a flag to the `IsochroneNode` to identify invalidated frontiers
and force its update later when processing the graph. This should
guarantee the same results when processing revision one-by-one vs. in
batches (in terms of db rows).
commit 1146a9b9203557195da47df2b76ba1603aa4ca31
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date: Tue Jun 8 16:20:21 2021 +0200
Refine maxdate calculation
commit 18063809ccc0b4f7cbfcf00fc95b26ba297c99ab
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date: Tue Jun 8 16:17:49 2021 +0200
Fix issue when processing revision in batch
If any revision in the batch was invalidating a frontier, the commit of
the complete batch failed. This is now fixed.
commit 52de7a0c11057ec80743807350f4a625efab11ba
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date: Tue Jun 8 11:08:10 2021 +0200
Add isochrone graph tests for the remaining heuristics
commit a5e8234b9f43ce02144ff9ff37a2caa00ebf608a
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date: Mon Jun 7 17:09:43 2021 +0200
Add test for isochrone graph topology
The expected isochrone graphs for each revision in the test should be
provided as a dictionary in an associated yaml file.
Currently only heuristic lower with depth=1 is being tested.
Also, model clases DirectoryEntry, FileEntry and IsochroneNode were
modified so that they can be compared by equlity and hashed.
commit 59c0f1bf49617824feae7ad08ce1b5f46b7a70cd
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date: Mon Jun 7 11:45:25 2021 +0200
Add equality check functions to model classes
commit 4ebab8d2ce933637c85bf456a796b6da8d12b513
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date: Fri Jun 4 15:01:38 2021 +0200
Refactor OriginEntry to include info about visit date and snapshot
Revisions reachable from an OriginEntry are now queried separately and returned in an iterable.
Also `origin_add` function was updated accordingly, and CLI command now uses a CSVOriginIterator
similar to that previously developed for revisions. Updated tests as well to ensure nothing was
broken during the refactoring.
commit 6ea9313800b86e996783f0bf5e37cc8c34f3627e
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date: Fri Jun 4 14:54:56 2021 +0200
Remove archive parameter from RevisionEntrySee https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/112/ for more details.
Comment Actions
Build is green
Patch application report for D5849 (id=20942)
Could not rebase; Attempt merge onto 075b0d6cd6...
Updating 075b0d6..c7d1840 Fast-forward swh/provenance/__init__.py | 16 +- swh/provenance/cli.py | 18 +- swh/provenance/graph.py | 214 ++++++++ swh/provenance/model.py | 76 ++- swh/provenance/origin.py | 184 ++++--- swh/provenance/postgresql/provenancedb_base.py | 352 ++++-------- .../postgresql/provenancedb_with_path.py | 155 +++--- .../postgresql/provenancedb_without_path.py | 104 ++-- swh/provenance/provenance.py | 593 ++++++--------------- swh/provenance/revision.py | 237 +++++++- swh/provenance/tests/conftest.py | 6 +- .../tests/data/graphs_cmdbts2_lower_1.yaml | 476 +++++++++++++++++ .../tests/data/graphs_cmdbts2_lower_2.yaml | 476 +++++++++++++++++ .../tests/data/graphs_cmdbts2_upper_1.yaml | 444 +++++++++++++++ .../tests/data/graphs_cmdbts2_upper_2.yaml | 436 +++++++++++++++ .../tests/data/graphs_out-of-order_lower_1.yaml | 223 ++++++++ .../tests/data/synthetic_out-of-order_lower_1.txt | 6 +- swh/provenance/tests/test_conftest.py | 2 +- swh/provenance/tests/test_isochrone_graph.py | 107 ++++ swh/provenance/tests/test_origin_iterator.py | 43 +- swh/provenance/tests/test_provenance_db.py | 16 +- swh/provenance/tests/test_provenance_heuristics.py | 42 +- swh/provenance/tests/test_revision_iterator.py | 4 +- 23 files changed, 3220 insertions(+), 1010 deletions(-) create mode 100644 swh/provenance/graph.py create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_lower_1.yaml create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_lower_2.yaml create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_upper_1.yaml create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_upper_2.yaml create mode 100644 swh/provenance/tests/data/graphs_out-of-order_lower_1.yaml create mode 100644 swh/provenance/tests/test_isochrone_graph.py
Changes applied before test
commit c7d184075a3ac13310cf2823ca580fce9457d7e1
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date: Thu Jun 10 21:56:57 2021 +0200
Reorganize code
Moved isochrone graph logic to its own file graph.py. Origin-revision layer's algorithm
is now in origin.py, while revision-content layer's logic was moved to revision.py.
commit 6a8d34145b7b113d8ca62cf134d50ab69c491ec7
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date: Wed Jun 9 16:27:51 2021 +0200
Split Provenance backend in two layers
First layer (temporarily called `ProvenanceBackend`) is responsable of
handling read/write caches and it should ideally be db absnostic (not
yet though).
Second layer is responsable of all db interaction. In revisions to come
it will be further refactored into sevel workers to guarantee no
collitions when writing to the DB.
commit 4a8964d25ff8490b8bf33d8480f6db1b97a0af22
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date: Wed Jun 9 12:41:54 2021 +0200
Refactor insertion methods in the Provenance backend
commit 4296febd8fbe3b0c8dc5a3650cbbd4ecf29713cf
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date: Wed Jun 9 12:24:31 2021 +0200
Simplify cache usage in the Provenance backend
commit af41748ef54dedf87f8304bb457b028b2de6369f
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date: Wed Jun 9 11:42:15 2021 +0200
Remove `directory_invalidate_in_isochrone_frontier` method from provenance interface
It was meant to be used in a multi-thread scenario which is not possible
due to Python's lack of actual parallelism. This way the
`build_isochrone_graph` function is guaranteed not to modify the DB (it
performs only reads now). Also the isochrone graph test was updated to
use `revision_add` with a new flag to avoid commits, hence emulating the
batch processing behaviour.
commit c20aeb432e831e412c13033c4e7a3d0ee6553e82
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date: Tue Jun 8 18:08:25 2021 +0200
Improve out-of-order revision processing
Added a flag to the `IsochroneNode` to identify invalidated frontiers
and force its update later when processing the graph. This should
guarantee the same results when processing revision one-by-one vs. in
batches (in terms of db rows).
commit 65226455d522f5156ed8d7e37d2b7546d0d010f1
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date: Tue Jun 8 16:20:21 2021 +0200
Refine maxdate calculation
commit d4ab6857f6a74e181316bf90db008b51d4b81085
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date: Tue Jun 8 16:17:49 2021 +0200
Fix issue when processing revision in batch
If any revision in the batch was invalidating a frontier, the commit of
the complete batch failed. This is now fixed.
commit d14247403019bd34e1e430c71e074574c89e3e57
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date: Tue Jun 8 11:08:10 2021 +0200
Add isochrone graph tests for the remaining heuristics
commit 594e5a83b38ceb99a46520e9d835b14074caed70
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date: Mon Jun 7 17:09:43 2021 +0200
Add test for isochrone graph topology
The expected isochrone graphs for each revision in the test should be
provided as a dictionary in an associated yaml file.
Currently only heuristic lower with depth=1 is being tested.
Also, model clases DirectoryEntry, FileEntry and IsochroneNode were
modified so that they can be compared by equlity and hashed.
commit 244b08b4b51c8f0891301e4495f05ba8368e156c
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date: Mon Jun 7 11:45:25 2021 +0200
Add equality check functions to model classes
commit 5a9fb987c9aa169095185b1559a87bce536776b7
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date: Fri Jun 4 15:01:38 2021 +0200
Refactor OriginEntry to include info about visit date and snapshot
Revisions reachable from an OriginEntry are now queried separately and returned in an iterable.
Also `origin_add` function was updated accordingly, and CLI command now uses a CSVOriginIterator
similar to that previously developed for revisions. Updated tests as well to ensure nothing was
broken during the refactoring.
commit fa4942ddff353c4d1d46c7f61ec570c9a28bc648
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date: Fri Jun 4 14:54:56 2021 +0200
Remove archive parameter from RevisionEntrySee https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/121/ for more details.
Comment Actions
Build is green
Patch application report for D5849 (id=20972)
Could not rebase; Attempt merge onto 075b0d6cd6...
Updating 075b0d6..206399e Fast-forward swh/provenance/__init__.py | 16 +- swh/provenance/cli.py | 18 +- swh/provenance/graph.py | 223 ++++++++ swh/provenance/model.py | 76 ++- swh/provenance/origin.py | 183 ++++--- swh/provenance/postgresql/provenancedb_base.py | 352 ++++-------- .../postgresql/provenancedb_with_path.py | 155 +++--- .../postgresql/provenancedb_without_path.py | 104 ++-- swh/provenance/provenance.py | 593 ++++++--------------- swh/provenance/revision.py | 237 +++++++- swh/provenance/tests/conftest.py | 6 +- .../tests/data/graphs_cmdbts2_lower_1.yaml | 401 ++++++++++++++ .../tests/data/graphs_cmdbts2_lower_2.yaml | 401 ++++++++++++++ .../tests/data/graphs_cmdbts2_upper_1.yaml | 371 +++++++++++++ .../tests/data/graphs_cmdbts2_upper_2.yaml | 365 +++++++++++++ .../tests/data/graphs_out-of-order_lower_1.yaml | 185 +++++++ .../tests/data/synthetic_out-of-order_lower_1.txt | 6 +- swh/provenance/tests/test_conftest.py | 2 +- swh/provenance/tests/test_isochrone_graph.py | 101 ++++ swh/provenance/tests/test_origin_iterator.py | 43 +- swh/provenance/tests/test_provenance_db.py | 16 +- swh/provenance/tests/test_provenance_heuristics.py | 51 +- swh/provenance/tests/test_revision_iterator.py | 4 +- 23 files changed, 2895 insertions(+), 1014 deletions(-) create mode 100644 swh/provenance/graph.py create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_lower_1.yaml create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_lower_2.yaml create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_upper_1.yaml create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_upper_2.yaml create mode 100644 swh/provenance/tests/data/graphs_out-of-order_lower_1.yaml create mode 100644 swh/provenance/tests/test_isochrone_graph.py
Changes applied before test
commit 206399eb8ae79e350c6c47af50589fec953d7d98
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date: Fri Jun 11 18:01:14 2021 +0200
Reorganize code
Moved isochrone graph logic to its own file graph.py. Origin-revision layer's algorithm
is now in origin.py, while revision-content layer's logic was moved to revision.py.
commit c4b1f31640b1263e8afb7c4c71a8ca3d984b3fd2
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date: Fri Jun 11 17:52:00 2021 +0200
Split Provenance backend in two layers
First layer (temporarily called `ProvenanceBackend`) is responsable of
handling read/write caches and it should ideally be db absnostic (not
yet though).
Second layer is responsable of all db interaction. In revisions to come
it will be further refactored into sevel workers to guarantee no
collitions when writing to the DB.
commit f1a9fe8182a3a6a8a47d6093197ee6b800fce95b
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date: Wed Jun 9 12:41:54 2021 +0200
Refactor insertion methods in the Provenance backend
commit 3f99025d6d45287ba7ce97db39eef3f9c5acb78c
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date: Wed Jun 9 12:24:31 2021 +0200
Simplify cache usage in the Provenance backend
commit d1b476b27ac4e7f355468a0514f6a9850dbf1143
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date: Tue Jun 8 16:17:49 2021 +0200
Improve out-of-order revision processing
Fix issue when processing revision in batch
If any revision in the batch was invalidating a frontier, the commit of
the complete batch failed. This is now fixed.
Refine maxdate calculation
Added a flag to the IsochroneNode to identify invalidated frontiers
and force its update later when processing the graph. This should
guarantee the same results when processing revision one-by-one vs. in
batches (in terms of db rows).
Remove directory_invalidate_in_isochrone_frontier method from provenance interface
It was meant to be used in a multi-thread scenario which is not possible
due to Python's lack of actual parallelism. This way the
build_isochrone_graph function is guaranteed not to modify the DB (it
performs only reads now). Also the isochrone graph test was updated to
use revision_add with a new flag to avoid commits, hence emulating the
batch processing behaviour.
commit 30bff867e97f37849d960fdc284513844fae2a34
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date: Tue Jun 8 11:08:10 2021 +0200
Add isochrone graph tests for the remaining heuristics
commit c2843ae5ba47bfb03d0fa10ce45ad274061097df
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date: Mon Jun 7 17:09:43 2021 +0200
Add test for isochrone graph topology
The expected isochrone graphs for each revision in the test should be
provided as a dictionary in an associated yaml file.
Currently only heuristic lower with depth=1 is being tested.
Also, model clases DirectoryEntry, FileEntry and IsochroneNode were
modified so that they can be compared by equlity and hashed.
commit 1dd14205ba60d02e14f2c352113871c1025b8e7f
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date: Mon Jun 7 11:45:25 2021 +0200
Add equality check functions to model classes
commit 9aaaedb3ebc981555276e99616a0c4fc837b78e9
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date: Fri Jun 4 15:01:38 2021 +0200
Refactor OriginEntry to include info about visit date and snapshot
Revisions reachable from an OriginEntry are now queried separately and returned in an iterable.
Also `origin_add` function was updated accordingly, and CLI command now uses a CSVOriginIterator
similar to that previously developed for revisions. Updated tests as well to ensure nothing was
broken during the refactoring.
commit fa4942ddff353c4d1d46c7f61ec570c9a28bc648
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date: Fri Jun 4 14:54:56 2021 +0200
Remove archive parameter from RevisionEntrySee https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/128/ for more details.