Page MenuHomeSoftware Heritage

Improve out-of-order revision processing
ClosedPublic

Authored by aeviso on Jun 10 2021, 12:22 PM.

Details

Summary

Fix issue when processing revision in batch

If any revision in the batch was invalidating a frontier, the commit of
the complete batch failed. This is now fixed.

Refine maxdate calculation

Improve out-of-order revision processing

Added a flag to the IsochroneNode to identify invalidated frontiers
and force its update later when processing the graph. This should
guarantee the same results when processing revision one-by-one vs. in
batches (in terms of db rows).

Remove directory_invalidate_in_isochrone_frontier method from provenance interface

It was meant to be used in a multi-thread scenario which is not possible
due to Python's lack of actual parallelism. This way the
build_isochrone_graph function is guaranteed not to modify the DB (it
performs only reads now). Also the isochrone graph test was updated to
use revision_add with a new flag to avoid commits, hence emulating the
batch processing behaviour.

Depends on D5845

Diff Detail

Repository
rDPROV Provenance database
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D5847 (id=20906)

Could not rebase; Attempt merge onto 6cdd424eba...

Updating 6cdd424..a5b7bd7
Fast-forward
 swh/provenance/cli.py                              |  12 +-
 swh/provenance/model.py                            |  69 ++-
 swh/provenance/origin.py                           | 107 ++---
 swh/provenance/postgresql/provenancedb_base.py     |  13 +-
 swh/provenance/provenance.py                       | 130 ++++--
 swh/provenance/revision.py                         |  24 +-
 .../tests/data/graphs_cmdbts2_lower_1.yaml         | 476 +++++++++++++++++++++
 .../tests/data/graphs_cmdbts2_lower_2.yaml         | 476 +++++++++++++++++++++
 .../tests/data/graphs_cmdbts2_upper_1.yaml         | 444 +++++++++++++++++++
 .../tests/data/graphs_cmdbts2_upper_2.yaml         | 436 +++++++++++++++++++
 .../tests/data/graphs_out-of-order_lower_1.yaml    | 223 ++++++++++
 .../tests/data/synthetic_out-of-order_lower_1.txt  |   2 +-
 swh/provenance/tests/test_isochrone_graph.py       | 104 +++++
 swh/provenance/tests/test_origin_iterator.py       |  43 +-
 swh/provenance/tests/test_provenance_db.py         |  12 +-
 swh/provenance/tests/test_provenance_heuristics.py |   6 +-
 swh/provenance/tests/test_revision_iterator.py     |   6 +-
 17 files changed, 2407 insertions(+), 176 deletions(-)
 create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_lower_1.yaml
 create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_lower_2.yaml
 create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_upper_1.yaml
 create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_upper_2.yaml
 create mode 100644 swh/provenance/tests/data/graphs_out-of-order_lower_1.yaml
 create mode 100644 swh/provenance/tests/test_isochrone_graph.py
Changes applied before test
commit a5b7bd73c0ec5fc7cf2b2c7e93c00b40d147ca84
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 9 11:42:15 2021 +0200

    Remove `directory_invalidate_in_isochrone_frontier` method from provenance interface
    
    It was meant to be used in a multi-thread scenario which is not possible
    due to Python's lack of actual parallelism. This way the
    `build_isochrone_graph` function is guaranteed not to modify the DB (it
    performs only reads now). Also the isochrone graph test was updated to
    use `revision_add` with a new flag to avoid commits, hence emulating the
    batch processing behaviour.

commit b24bc279c19e346a77d233fa7d24f148f52c5d89
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Tue Jun 8 18:08:25 2021 +0200

    Improve out-of-order revision processing
    
    Added a flag to the `IsochroneNode` to identify invalidated frontiers
    and force its update later when processing the graph. This should
    guarantee the same results when processing revision one-by-one vs. in
    batches (in terms of db rows).

commit 1146a9b9203557195da47df2b76ba1603aa4ca31
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Tue Jun 8 16:20:21 2021 +0200

    Refine maxdate calculation

commit 18063809ccc0b4f7cbfcf00fc95b26ba297c99ab
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Tue Jun 8 16:17:49 2021 +0200

    Fix issue when processing revision in batch
    
    If any revision in the batch was invalidating a frontier, the commit of
    the complete batch failed. This is now fixed.

commit 52de7a0c11057ec80743807350f4a625efab11ba
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Tue Jun 8 11:08:10 2021 +0200

    Add isochrone graph tests for the remaining heuristics

commit a5e8234b9f43ce02144ff9ff37a2caa00ebf608a
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 7 17:09:43 2021 +0200

    Add test for isochrone graph topology
    
    The expected isochrone graphs for each revision in the test should be
    provided as a dictionary in an associated yaml file.
    Currently only heuristic lower with depth=1 is being tested.
    
    Also, model clases DirectoryEntry, FileEntry and IsochroneNode were
    modified so that they can be compared by equlity and hashed.

commit 59c0f1bf49617824feae7ad08ce1b5f46b7a70cd
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 7 11:45:25 2021 +0200

    Add equality check functions to model classes

commit 4ebab8d2ce933637c85bf456a796b6da8d12b513
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Fri Jun 4 15:01:38 2021 +0200

    Refactor OriginEntry to include info about visit date and snapshot
    
    Revisions reachable from an OriginEntry are now queried separately and returned in an iterable.
    Also `origin_add` function was updated accordingly, and CLI command now uses a CSVOriginIterator
    similar to that previously developed for revisions. Updated tests as well to ensure nothing was
    broken during the refactoring.

commit 6ea9313800b86e996783f0bf5e37cc8c34f3627e
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Fri Jun 4 14:54:56 2021 +0200

    Remove archive parameter from RevisionEntry

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/110/ for more details.

douardda added inline comments.
swh/provenance/tests/data/synthetic_out-of-order_lower_1.txt
36

please remove the warning comment above this line if this is now fixed.

swh/provenance/tests/test_isochrone_graph.py
38–40

better use d.get("known", False) here (same for "invalid" below)

swh/provenance/tests/test_provenance_heuristics.py
224

please remove all the comments there now this issue is fixed.

Actually, I would have much preferred this diff to be 4 diffs also ;-)

This revision now requires changes to proceed.Jun 10 2021, 3:04 PM
swh/provenance/tests/test_provenance_heuristics.py
224

Sure, I kept is as a reminder (to myself) that we should properly test that batch vs single revision processing actually yield the same result. I guess one way is to have this heuristic test done with both kind of processing. I just didn't wanted to duplicate the whole function, I'm sure there is a better way to do that

Build is green

Patch application report for D5847 (id=20940)

Could not rebase; Attempt merge onto 075b0d6cd6...

Updating 075b0d6..af41748
Fast-forward
 swh/provenance/cli.py                              |  12 +-
 swh/provenance/model.py                            |  76 +++-
 swh/provenance/origin.py                           | 107 ++---
 swh/provenance/postgresql/provenancedb_base.py     |  13 +-
 swh/provenance/provenance.py                       | 131 ++++--
 swh/provenance/revision.py                         |  24 +-
 .../tests/data/graphs_cmdbts2_lower_1.yaml         | 476 +++++++++++++++++++++
 .../tests/data/graphs_cmdbts2_lower_2.yaml         | 476 +++++++++++++++++++++
 .../tests/data/graphs_cmdbts2_upper_1.yaml         | 444 +++++++++++++++++++
 .../tests/data/graphs_cmdbts2_upper_2.yaml         | 436 +++++++++++++++++++
 .../tests/data/graphs_out-of-order_lower_1.yaml    | 223 ++++++++++
 .../tests/data/synthetic_out-of-order_lower_1.txt  |   6 +-
 swh/provenance/tests/test_isochrone_graph.py       | 106 +++++
 swh/provenance/tests/test_origin_iterator.py       |  43 +-
 swh/provenance/tests/test_provenance_db.py         |  12 +-
 swh/provenance/tests/test_provenance_heuristics.py |   6 +-
 swh/provenance/tests/test_revision_iterator.py     |   4 +-
 17 files changed, 2415 insertions(+), 180 deletions(-)
 create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_lower_1.yaml
 create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_lower_2.yaml
 create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_upper_1.yaml
 create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_upper_2.yaml
 create mode 100644 swh/provenance/tests/data/graphs_out-of-order_lower_1.yaml
 create mode 100644 swh/provenance/tests/test_isochrone_graph.py
Changes applied before test
commit af41748ef54dedf87f8304bb457b028b2de6369f
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Wed Jun 9 11:42:15 2021 +0200

    Remove `directory_invalidate_in_isochrone_frontier` method from provenance interface
    
    It was meant to be used in a multi-thread scenario which is not possible
    due to Python's lack of actual parallelism. This way the
    `build_isochrone_graph` function is guaranteed not to modify the DB (it
    performs only reads now). Also the isochrone graph test was updated to
    use `revision_add` with a new flag to avoid commits, hence emulating the
    batch processing behaviour.

commit c20aeb432e831e412c13033c4e7a3d0ee6553e82
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Tue Jun 8 18:08:25 2021 +0200

    Improve out-of-order revision processing
    
    Added a flag to the `IsochroneNode` to identify invalidated frontiers
    and force its update later when processing the graph. This should
    guarantee the same results when processing revision one-by-one vs. in
    batches (in terms of db rows).

commit 65226455d522f5156ed8d7e37d2b7546d0d010f1
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Tue Jun 8 16:20:21 2021 +0200

    Refine maxdate calculation

commit d4ab6857f6a74e181316bf90db008b51d4b81085
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Tue Jun 8 16:17:49 2021 +0200

    Fix issue when processing revision in batch
    
    If any revision in the batch was invalidating a frontier, the commit of
    the complete batch failed. This is now fixed.

commit d14247403019bd34e1e430c71e074574c89e3e57
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Tue Jun 8 11:08:10 2021 +0200

    Add isochrone graph tests for the remaining heuristics

commit 594e5a83b38ceb99a46520e9d835b14074caed70
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 7 17:09:43 2021 +0200

    Add test for isochrone graph topology
    
    The expected isochrone graphs for each revision in the test should be
    provided as a dictionary in an associated yaml file.
    Currently only heuristic lower with depth=1 is being tested.
    
    Also, model clases DirectoryEntry, FileEntry and IsochroneNode were
    modified so that they can be compared by equlity and hashed.

commit 244b08b4b51c8f0891301e4495f05ba8368e156c
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 7 11:45:25 2021 +0200

    Add equality check functions to model classes

commit 5a9fb987c9aa169095185b1559a87bce536776b7
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Fri Jun 4 15:01:38 2021 +0200

    Refactor OriginEntry to include info about visit date and snapshot
    
    Revisions reachable from an OriginEntry are now queried separately and returned in an iterable.
    Also `origin_add` function was updated accordingly, and CLI command now uses a CSVOriginIterator
    similar to that previously developed for revisions. Updated tests as well to ensure nothing was
    broken during the refactoring.

commit fa4942ddff353c4d1d46c7f61ec570c9a28bc648
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Fri Jun 4 14:54:56 2021 +0200

    Remove archive parameter from RevisionEntry

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/119/ for more details.

swh/provenance/tests/test_provenance_heuristics.py
224

yes, just use a pytest.mark.parametrize (flag) argument for this and a if statement in the code to either import revisions as a whole or in a loop.
Note: it's possible to stack pytest.mark.parametrize decorators, no need to do the combinatory logic in a single one.

swh/provenance/tests/test_provenance_heuristics.py
224

I was thinking on doing a separate test for that actually, but you are right in that just running this one with an extra parameter might be good enough. I'll do that, if later we figured out that this should be improved we can do it anyway.

swh/provenance/provenance.py
245–253

I don't really understand the logic behind this commit flag. Because now we have 4 situations to test:

  1. one add_revision(revisions=[several revisions], commit=True)
  2. loop of add_revision(revisions=[one revision], commit=True)
  3. one add_revision(revisions=[several revisions], commit=False)
  4. loop of add_revision(revisions=[one revision], commit=False)

Note that 1 and 4 (according you do a commit after the loop) are not equivalent!

So i don't really see the point of this commit flag, and when one will want to set it to False.

swh/provenance/tests/test_isochrone_graph.py
49

better stack 2 @pytest.mark.parametrize() than exploding the combination logic here.

swh/provenance/tests/test_isochrone_graph.py
99

I don't understand, when is the "commit" actually done when batch is True?

swh/provenance/tests/test_isochrone_graph.py
38–40

actually, rereading this, why is this .get() needed at all? The input dictionary should be complete, and I'd rather like it fail if not.

This revision now requires changes to proceed.Jun 11 2021, 11:11 AM
aeviso added inline comments.
swh/provenance/provenance.py
245–253

It's just for testing purposes. When creating isochrone graphs we need to simulate the situation where we processed up to a certain revision and we want to create the isochrone graph of the following one as if they were being processed in the same batch.

swh/provenance/tests/test_isochrone_graph.py
38–40

I wanted to simplify yaml files by allowing not to specify default values. Ideally null dates, false booleans or empty list should not be necessary, so that creating and maintaining these is easier.

99

never, all graphs are built as if the revisions were processed in the same batch. We don't need to commit to the db to test isochrone graph creation

swh/provenance/provenance.py
245–253

which is not what we are doing here. If you want to test the isochrone graph generation with revisions processed by batch, then you need a test that does exactly that: a loop that resets the provenance db, make one revision_add() call with all the revisions up to revision X, and check the resulting isochrone graph.

swh/provenance/tests/test_isochrone_graph.py
38–40

I think it's better to keep them explicit and complete.

swh/provenance/provenance.py
245–253

The problem with that approach is that revision_add() up to revision X will perform the commit, then when creating the isochrone graph for the following revision (the first non-processed one) the read/write caches will be empty (ie. the following revision will be processed as if it belongs to a different batch)

swh/provenance/provenance.py
245–253

Yo are right, I forgot the "commit" does indeed clear the read cache also.

So both cases are not really correct (once again, it's not the same execution path to do one revision_add() of a batch of revisions than calling revision_add() with only one revision at a time).

This shows something could be improved in this code (not sure how for now).

At least, instead of adding a commit flag as argument of the revision_add method, just remove the commit part from this method entirely; it's the responsibility of the caller to decide when to flush the write cache in the DB (maybe add a commit() method in this class with the while loop).

swh/provenance/provenance.py
245–253

Yo are right, I forgot the "commit" does indeed clear the read cache also.

Which also raises the question: why is that so? Why not keep the (read) cache for as long as we can?
It would be necessary to improve the cache (using proper lru cache or similar), but it may be quite useful for perfomance. I mean we already have this cached values, throwing them away seems odd.

Build is green

Patch application report for D5847 (id=20961)

Could not rebase; Attempt merge onto 075b0d6cd6...

Updating 075b0d6..3c8ff22
Fast-forward
 swh/provenance/cli.py                              |  12 +-
 swh/provenance/model.py                            |  76 +++-
 swh/provenance/origin.py                           | 106 ++----
 swh/provenance/postgresql/provenancedb_base.py     |  13 +-
 swh/provenance/provenance.py                       | 140 +++++--
 swh/provenance/revision.py                         |  24 +-
 .../tests/data/graphs_cmdbts2_lower_1.yaml         | 401 +++++++++++++++++++++
 .../tests/data/graphs_cmdbts2_lower_2.yaml         | 401 +++++++++++++++++++++
 .../tests/data/graphs_cmdbts2_upper_1.yaml         | 371 +++++++++++++++++++
 .../tests/data/graphs_cmdbts2_upper_2.yaml         | 365 +++++++++++++++++++
 .../tests/data/graphs_out-of-order_lower_1.yaml    | 185 ++++++++++
 .../tests/data/synthetic_out-of-order_lower_1.txt  |   6 +-
 swh/provenance/tests/test_isochrone_graph.py       | 100 +++++
 swh/provenance/tests/test_origin_iterator.py       |  43 ++-
 swh/provenance/tests/test_provenance_db.py         |  12 +-
 swh/provenance/tests/test_provenance_heuristics.py |  15 +-
 swh/provenance/tests/test_revision_iterator.py     |   4 +-
 17 files changed, 2090 insertions(+), 184 deletions(-)
 create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_lower_1.yaml
 create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_lower_2.yaml
 create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_upper_1.yaml
 create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_upper_2.yaml
 create mode 100644 swh/provenance/tests/data/graphs_out-of-order_lower_1.yaml
 create mode 100644 swh/provenance/tests/test_isochrone_graph.py
Changes applied before test
commit 3c8ff220ae4c3d5375fbd5e0981835a67f11f911
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Tue Jun 8 16:17:49 2021 +0200

    Fix issue when processing revision in batch
    
    If any revision in the batch was invalidating a frontier, the commit of
    the complete batch failed. This is now fixed.

commit 30bff867e97f37849d960fdc284513844fae2a34
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Tue Jun 8 11:08:10 2021 +0200

    Add isochrone graph tests for the remaining heuristics

commit c2843ae5ba47bfb03d0fa10ce45ad274061097df
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 7 17:09:43 2021 +0200

    Add test for isochrone graph topology
    
    The expected isochrone graphs for each revision in the test should be
    provided as a dictionary in an associated yaml file.
    Currently only heuristic lower with depth=1 is being tested.
    
    Also, model clases DirectoryEntry, FileEntry and IsochroneNode were
    modified so that they can be compared by equlity and hashed.

commit 1dd14205ba60d02e14f2c352113871c1025b8e7f
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 7 11:45:25 2021 +0200

    Add equality check functions to model classes

commit 9aaaedb3ebc981555276e99616a0c4fc837b78e9
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Fri Jun 4 15:01:38 2021 +0200

    Refactor OriginEntry to include info about visit date and snapshot
    
    Revisions reachable from an OriginEntry are now queried separately and returned in an iterable.
    Also `origin_add` function was updated accordingly, and CLI command now uses a CSVOriginIterator
    similar to that previously developed for revisions. Updated tests as well to ensure nothing was
    broken during the refactoring.

commit fa4942ddff353c4d1d46c7f61ec570c9a28bc648
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Fri Jun 4 14:54:56 2021 +0200

    Remove archive parameter from RevisionEntry

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/124/ for more details.

Build is green

Patch application report for D5847 (id=20970)

Could not rebase; Attempt merge onto 075b0d6cd6...

Updating 075b0d6..d1b476b
Fast-forward
 swh/provenance/cli.py                              |  12 +-
 swh/provenance/model.py                            |  76 +++-
 swh/provenance/origin.py                           | 106 ++----
 swh/provenance/postgresql/provenancedb_base.py     |  13 +-
 swh/provenance/provenance.py                       | 140 +++++--
 swh/provenance/revision.py                         |  24 +-
 .../tests/data/graphs_cmdbts2_lower_1.yaml         | 401 +++++++++++++++++++++
 .../tests/data/graphs_cmdbts2_lower_2.yaml         | 401 +++++++++++++++++++++
 .../tests/data/graphs_cmdbts2_upper_1.yaml         | 371 +++++++++++++++++++
 .../tests/data/graphs_cmdbts2_upper_2.yaml         | 365 +++++++++++++++++++
 .../tests/data/graphs_out-of-order_lower_1.yaml    | 185 ++++++++++
 .../tests/data/synthetic_out-of-order_lower_1.txt  |   6 +-
 swh/provenance/tests/test_isochrone_graph.py       | 100 +++++
 swh/provenance/tests/test_origin_iterator.py       |  43 ++-
 swh/provenance/tests/test_provenance_db.py         |  12 +-
 swh/provenance/tests/test_provenance_heuristics.py |  15 +-
 swh/provenance/tests/test_revision_iterator.py     |   4 +-
 17 files changed, 2090 insertions(+), 184 deletions(-)
 create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_lower_1.yaml
 create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_lower_2.yaml
 create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_upper_1.yaml
 create mode 100644 swh/provenance/tests/data/graphs_cmdbts2_upper_2.yaml
 create mode 100644 swh/provenance/tests/data/graphs_out-of-order_lower_1.yaml
 create mode 100644 swh/provenance/tests/test_isochrone_graph.py
Changes applied before test
commit d1b476b27ac4e7f355468a0514f6a9850dbf1143
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Tue Jun 8 16:17:49 2021 +0200

    Improve out-of-order revision processing
    
    Fix issue when processing revision in batch
    
    If any revision in the batch was invalidating a frontier, the commit of
    the complete batch failed. This is now fixed.
    
    Refine maxdate calculation
    
    Added a flag to the IsochroneNode to identify invalidated frontiers
    and force its update later when processing the graph. This should
    guarantee the same results when processing revision one-by-one vs. in
    batches (in terms of db rows).
    
    Remove directory_invalidate_in_isochrone_frontier method from provenance interface
    
    It was meant to be used in a multi-thread scenario which is not possible
    due to Python's lack of actual parallelism. This way the
    build_isochrone_graph function is guaranteed not to modify the DB (it
    performs only reads now). Also the isochrone graph test was updated to
    use revision_add with a new flag to avoid commits, hence emulating the
    batch processing behaviour.

commit 30bff867e97f37849d960fdc284513844fae2a34
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Tue Jun 8 11:08:10 2021 +0200

    Add isochrone graph tests for the remaining heuristics

commit c2843ae5ba47bfb03d0fa10ce45ad274061097df
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 7 17:09:43 2021 +0200

    Add test for isochrone graph topology
    
    The expected isochrone graphs for each revision in the test should be
    provided as a dictionary in an associated yaml file.
    Currently only heuristic lower with depth=1 is being tested.
    
    Also, model clases DirectoryEntry, FileEntry and IsochroneNode were
    modified so that they can be compared by equlity and hashed.

commit 1dd14205ba60d02e14f2c352113871c1025b8e7f
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Mon Jun 7 11:45:25 2021 +0200

    Add equality check functions to model classes

commit 9aaaedb3ebc981555276e99616a0c4fc837b78e9
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Fri Jun 4 15:01:38 2021 +0200

    Refactor OriginEntry to include info about visit date and snapshot
    
    Revisions reachable from an OriginEntry are now queried separately and returned in an iterable.
    Also `origin_add` function was updated accordingly, and CLI command now uses a CSVOriginIterator
    similar to that previously developed for revisions. Updated tests as well to ensure nothing was
    broken during the refactoring.

commit fa4942ddff353c4d1d46c7f61ec570c9a28bc648
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Fri Jun 4 14:54:56 2021 +0200

    Remove archive parameter from RevisionEntry

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/126/ for more details.

This revision is now accepted and ready to land.Jun 14 2021, 10:43 AM
This revision was automatically updated to reflect the committed changes.