Page MenuHomeSoftware Heritage

Add a test_provenance_heuristics_content_find_all() test
ClosedPublic

Authored by douardda on Jun 2 2021, 12:24 PM.

Details

Summary

test that ProvenanceDB.find_all() behaves as expected for all test
datasets.

Depends on D5811.

Event Timeline

Build is green

Patch application report for D5812 (id=20747)

Could not rebase; Attempt merge onto 5aa0314dd7...

Updating 5aa0314..8c22dbd
Fast-forward
 requirements-test.txt                              |   1 +
 swh/provenance/model.py                            |  21 +-
 swh/provenance/postgresql/provenancedb_base.py     |  12 +-
 .../postgresql/provenancedb_with_path.py           | 115 +++------
 swh/provenance/provenance.py                       | 201 ++++++++-------
 swh/provenance/tests/conftest.py                   |  38 ++-
 swh/provenance/tests/data/README.md                | 138 ++++++++++
 swh/provenance/tests/data/cmdbts2.msgpack          | Bin 0 -> 17734 bytes
 swh/provenance/tests/data/cmdbts2_repo.yaml        |  80 ++++++
 .../tests/data/generate_storage_from_git.py        | 115 +++++++++
 swh/provenance/tests/data/out-of-order.msgpack     | Bin 0 -> 5254 bytes
 swh/provenance/tests/data/out-of-order_repo.yaml   |  29 +++
 .../tests/data/synthetic_cmdbts2_lower_1.txt       |  91 +++++++
 .../tests/data/synthetic_cmdbts2_lower_2.txt       |  91 +++++++
 .../tests/data/synthetic_cmdbts2_upper_1.txt       |  91 +++++++
 .../tests/data/synthetic_cmdbts2_upper_2.txt       |  91 +++++++
 swh/provenance/tests/data/synthetic_lower_1.txt    |  91 -------
 swh/provenance/tests/data/synthetic_lower_2.txt    |  91 -------
 .../tests/data/synthetic_out-of-order_lower_1.txt  |  31 +++
 swh/provenance/tests/data/synthetic_upper_1.txt    |  92 -------
 swh/provenance/tests/data/synthetic_upper_2.txt    |  91 -------
 swh/provenance/tests/test_provenance_db.py         | 281 ++++++++++++---------
 swh/provenance/tests/test_provenance_db_storage.py |   2 +-
 swh/provenance/tests/test_provenance_heuristics.py | 247 ++++++++++++++++++
 24 files changed, 1366 insertions(+), 674 deletions(-)
 create mode 100644 swh/provenance/tests/data/README.md
 create mode 100644 swh/provenance/tests/data/cmdbts2.msgpack
 create mode 100644 swh/provenance/tests/data/cmdbts2_repo.yaml
 create mode 100644 swh/provenance/tests/data/generate_storage_from_git.py
 create mode 100644 swh/provenance/tests/data/out-of-order.msgpack
 create mode 100644 swh/provenance/tests/data/out-of-order_repo.yaml
 create mode 100644 swh/provenance/tests/data/synthetic_cmdbts2_lower_1.txt
 create mode 100644 swh/provenance/tests/data/synthetic_cmdbts2_lower_2.txt
 create mode 100644 swh/provenance/tests/data/synthetic_cmdbts2_upper_1.txt
 create mode 100644 swh/provenance/tests/data/synthetic_cmdbts2_upper_2.txt
 delete mode 100644 swh/provenance/tests/data/synthetic_lower_1.txt
 delete mode 100644 swh/provenance/tests/data/synthetic_lower_2.txt
 create mode 100644 swh/provenance/tests/data/synthetic_out-of-order_lower_1.txt
 delete mode 100644 swh/provenance/tests/data/synthetic_upper_1.txt
 delete mode 100644 swh/provenance/tests/data/synthetic_upper_2.txt
 create mode 100644 swh/provenance/tests/test_provenance_heuristics.py
Changes applied before test
commit 8c22dbdd282e20512dafb1bc14b54b898f1a3db3
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jun 2 12:10:50 2021 +0200

    Add a test_provenance_heuristics_content_find_all() test
    
    test that ProvenanceDB.find_all() behaves as expected for all test
    datasets.

commit 4ba858811022b0865f379c04967a25b594b9bdd1
Author: David Douard <david.douard@sdfa3.org>
Date:   Tue Jun 1 16:34:57 2021 +0200

    Add a simple out-of-order dataset

commit e3d6e0b3c4708d31e9f26c0dc8b415f00bb219aa
Author: David Douard <david.douard@sdfa3.org>
Date:   Tue Jun 1 11:47:09 2021 +0200

    Remove test_provenance_heuristics_CMDBTS test
    
    since it's redundant with the cmdbts2 test, now generated from a simple
    yaml file rather than depending on the original CMDBTS git repo on
    github.
    
    The CMDBTS dataset (CMDBTS.msgpack) is kept for now since it's still
    used for other tests (e.g. test_provenance_db).

commit f55368813234fd693ad3bb98ce0ed83e53e0ce22
Author: David Douard <david.douard@sdfa3.org>
Date:   Mon May 31 16:51:01 2021 +0200

    Add a new (git) dataset generation scaffolding for tests
    
    and use it to the generate a 'cmdbts2' test case strictly equivalent
    to the CMDBTS repo.
    
    See the swh/provenance/tests/data/README.md file for more details.
    
    Note: this aims at making easy to write more test cases than depending
    on the CMDBTS git repo on github. For example, a new test case should
    come soon for situations like 'out-of-order' revisions.

commit 19f8bd8f5d7476339d8e7eabd0ba2a8aa800251f
Author: David Douard <david.douard@sdfa3.org>
Date:   Mon May 31 16:45:48 2021 +0200

    Remove test_provenance_heuristics from tests from ArchvieStorage tests
    
    because it's not that meaningful.

commit 08d8dd6478836ff4ab1c00c67f553b6d705b5a9c
Author: David Douard <david.douard@sdfa3.org>
Date:   Tue May 25 14:31:51 2021 +0200

    Simplify DB queries in ProvenanceWithPathDB.content_find_(first|all)
    
    the queries should be exactly the same as before (query plans are the
    same); just written (hopefully) in a bit more readable manner.

commit fd43523fd594e70ccd002827d379321f52c2b6da
Author: David Douard <david.douard@sdfa3.org>
Date:   Tue May 25 14:59:16 2021 +0200

    Add a test for content_find_all()

commit d85f2b0ee48aefe03ad32311623e5390f43d7261
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed May 12 09:47:03 2021 +0200

    Refactor the isochrone graph computation
    
    attempt to simplify a bit this part of the code:
    
    - IsochroneNode are now only used for directories
    - FileEntry are stored in a new IsochroneNode.files attribute, so
    - IsochroneNode.children only stores IsochroneNode (thus DirectoryEntry)
      objects,
    - rename IsochroneNode.date as 'dbdate' and clarify its semantics

commit 31d833ec86bf041e100795e7796ce832d00450ef
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed May 19 16:16:53 2021 +0200

    Add 'ls_files()' and 'ls_dirs()' methods to the DirectoryEntry class
    
    to make it a bit easier to compute the isochrone graph (see following
    revisions).

commit 72644b98a218132c0b173f360c503438688ecebb
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed May 19 16:14:41 2021 +0200

    Add __str__ methods to RevisionEntry, DirectoryEntry and FileEntry
    
    to ease logging and debugging.

commit a71041fbaf3f0d7ec3ea944cbbf04286c57d8b7e
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed May 12 12:44:07 2021 +0200

    Improve a bit the code of ProvenanceDBBase

commit defcb388ffba0869edb1a126b6626710c396c2ac
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed May 12 10:30:23 2021 +0200

    Add a test for the build_isochrone_graph() function
    
    this test is far from ideal, since it's mostly the record of what happen
    during a "known good" session of revision insertions, but at least it
    should allow to refactor code related to the isochrone graph computation
    with a bit more confidence...

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/48/ for more details.

douardda added inline comments.
swh/provenance/tests/test_provenance_heuristics.py
213–214

@aeviso please note this. In my 'out-of-order' test, I could not insert all the revisions at once (raise a KeyError when executing insert_location). Is it somewhat "expected"? (Looks related to my questions/doubts in https://forge.softwareheritage.org/D5811#147864 )

Build is green

Patch application report for D5812 (id=20757)

Could not rebase; Attempt merge onto 49e47c3ea7...

Removing swh/provenance/tests/data/synthetic_upper_2.txt
Removing swh/provenance/tests/data/synthetic_upper_1.txt
Removing swh/provenance/tests/data/synthetic_lower_2.txt
Removing swh/provenance/tests/data/synthetic_lower_1.txt
Merge made by the 'recursive' strategy.
 requirements-test.txt                              |   1 +
 swh/provenance/model.py                            |  31 ++-
 swh/provenance/postgresql/provenancedb_base.py     |  12 +-
 .../postgresql/provenancedb_with_path.py           | 115 +++------
 swh/provenance/provenance.py                       | 220 +++++++++-------
 swh/provenance/tests/conftest.py                   |  38 ++-
 swh/provenance/tests/data/README.md                | 138 ++++++++++
 swh/provenance/tests/data/cmdbts2.msgpack          | Bin 0 -> 17734 bytes
 swh/provenance/tests/data/cmdbts2_repo.yaml        |  80 ++++++
 .../tests/data/generate_storage_from_git.py        | 115 +++++++++
 swh/provenance/tests/data/out-of-order.msgpack     | Bin 0 -> 5254 bytes
 swh/provenance/tests/data/out-of-order_repo.yaml   |  29 +++
 .../tests/data/synthetic_cmdbts2_lower_1.txt       |  91 +++++++
 .../tests/data/synthetic_cmdbts2_lower_2.txt       |  91 +++++++
 .../tests/data/synthetic_cmdbts2_upper_1.txt       |  91 +++++++
 .../tests/data/synthetic_cmdbts2_upper_2.txt       |  91 +++++++
 swh/provenance/tests/data/synthetic_lower_1.txt    |  91 -------
 swh/provenance/tests/data/synthetic_lower_2.txt    |  91 -------
 .../tests/data/synthetic_out-of-order_lower_1.txt  |  31 +++
 swh/provenance/tests/data/synthetic_upper_1.txt    |  92 -------
 swh/provenance/tests/data/synthetic_upper_2.txt    |  91 -------
 swh/provenance/tests/test_provenance_db.py         | 281 ++++++++++++---------
 swh/provenance/tests/test_provenance_db_storage.py |   2 +-
 swh/provenance/tests/test_provenance_heuristics.py | 247 ++++++++++++++++++
 24 files changed, 1387 insertions(+), 682 deletions(-)
 create mode 100644 swh/provenance/tests/data/README.md
 create mode 100644 swh/provenance/tests/data/cmdbts2.msgpack
 create mode 100644 swh/provenance/tests/data/cmdbts2_repo.yaml
 create mode 100644 swh/provenance/tests/data/generate_storage_from_git.py
 create mode 100644 swh/provenance/tests/data/out-of-order.msgpack
 create mode 100644 swh/provenance/tests/data/out-of-order_repo.yaml
 create mode 100644 swh/provenance/tests/data/synthetic_cmdbts2_lower_1.txt
 create mode 100644 swh/provenance/tests/data/synthetic_cmdbts2_lower_2.txt
 create mode 100644 swh/provenance/tests/data/synthetic_cmdbts2_upper_1.txt
 create mode 100644 swh/provenance/tests/data/synthetic_cmdbts2_upper_2.txt
 delete mode 100644 swh/provenance/tests/data/synthetic_lower_1.txt
 delete mode 100644 swh/provenance/tests/data/synthetic_lower_2.txt
 create mode 100644 swh/provenance/tests/data/synthetic_out-of-order_lower_1.txt
 delete mode 100644 swh/provenance/tests/data/synthetic_upper_1.txt
 delete mode 100644 swh/provenance/tests/data/synthetic_upper_2.txt
 create mode 100644 swh/provenance/tests/test_provenance_heuristics.py
Changes applied before test
commit 6385192a00466fea8ff62dcf72b0267629e4bad6
Merge: 49e47c3 ecbd4de
Author: Jenkins user <jenkins@localhost>
Date:   Wed Jun 2 15:43:01 2021 +0000

    Merge branch 'diff-target' into HEAD

commit ecbd4de588dfe7134bdc73b21a8dffa42a2d302f
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jun 2 12:10:50 2021 +0200

    Add a test_provenance_heuristics_content_find_all() test
    
    test that ProvenanceDB.find_all() behaves as expected for all test
    datasets.

commit 21d48d1410d720d7e606b7ddc452aa019cd86c26
Author: David Douard <david.douard@sdfa3.org>
Date:   Tue Jun 1 16:34:57 2021 +0200

    Add a simple out-of-order dataset

commit 421b2c832a9d37ee6de8b29eebf7c1f65ed01d5a
Author: David Douard <david.douard@sdfa3.org>
Date:   Tue Jun 1 11:47:09 2021 +0200

    Remove test_provenance_heuristics_CMDBTS test
    
    since it's redundant with the cmdbts2 test, now generated from a simple
    yaml file rather than depending on the original CMDBTS git repo on
    github.
    
    The CMDBTS dataset (CMDBTS.msgpack) is kept for now since it's still
    used for other tests (e.g. test_provenance_db).

commit 10a166272c262f7725d041fc0b15219868eedacb
Author: David Douard <david.douard@sdfa3.org>
Date:   Mon May 31 16:51:01 2021 +0200

    Add a new (git) dataset generation scaffolding for tests
    
    and use it to the generate a 'cmdbts2' test case strictly equivalent
    to the CMDBTS repo.
    
    See the swh/provenance/tests/data/README.md file for more details.
    
    Note: this aims at making easy to write more test cases than depending
    on the CMDBTS git repo on github. For example, a new test case should
    come soon for situations like 'out-of-order' revisions.

commit ddc2ae0583db4b317c04d97386d18d2c17ae00d7
Author: David Douard <david.douard@sdfa3.org>
Date:   Mon May 31 16:45:48 2021 +0200

    Remove test_provenance_heuristics from tests from ArchvieStorage tests
    
    because it's not that meaningful.

commit 16bab3c60c2a3a80782273f1aaff796826e7dc2c
Author: David Douard <david.douard@sdfa3.org>
Date:   Tue May 25 14:31:51 2021 +0200

    Simplify DB queries in ProvenanceWithPathDB.content_find_(first|all)
    
    the queries should be exactly the same as before (query plans are the
    same); just written (hopefully) in a bit more readable manner.

commit 024cc9ce93e545782a980f8e81d5d09651b8231b
Author: David Douard <david.douard@sdfa3.org>
Date:   Tue May 25 14:59:16 2021 +0200

    Add a test for content_find_all()

commit af15ad65f4a34e7703bfec80666102a6403cb505
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed May 12 09:47:03 2021 +0200

    Refactor the isochrone graph computation
    
    attempt to simplify a bit this part of the code:
    
    - IsochroneNode are now only used for directories
    - FileEntry are used directly from IsochroneNode.entry.files (no need
      for creating new FileEntry instances), so
    - IsochroneNode.children only stores IsochroneNode (thus DirectoryEntry)
      objects,
    - rename IsochroneNode.date as 'dbdate' and clarify its semantics,
    - attempt to document (comments) a bit more the algorithm and semantics
      of several attributes/variables used in there.

commit 1f49fdc967a2854d3a68dec34886b824fdf045f6
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed May 19 16:16:53 2021 +0200

    Replace 'DirectoryEntry.ls()' method by 'files' and 'dirs' properties
    
    and make the retrieval of children from the archive explicit in a
    dedicated retrieve_children() method.

commit 72644b98a218132c0b173f360c503438688ecebb
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed May 19 16:14:41 2021 +0200

    Add __str__ methods to RevisionEntry, DirectoryEntry and FileEntry
    
    to ease logging and debugging.

commit a71041fbaf3f0d7ec3ea944cbbf04286c57d8b7e
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed May 12 12:44:07 2021 +0200

    Improve a bit the code of ProvenanceDBBase

commit defcb388ffba0869edb1a126b6626710c396c2ac
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed May 12 10:30:23 2021 +0200

    Add a test for the build_isochrone_graph() function
    
    this test is far from ideal, since it's mostly the record of what happen
    during a "known good" session of revision insertions, but at least it
    should allow to refactor code related to the isochrone graph computation
    with a bit more confidence...

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/56/ for more details.

Build is green

Patch application report for D5812 (id=20771)

Could not rebase; Attempt merge onto 49e47c3ea7...

Updating 49e47c3..020c33d
Fast-forward
 requirements-test.txt                              |   1 +
 swh/provenance/model.py                            |  31 ++-
 swh/provenance/postgresql/provenancedb_base.py     |  12 +-
 .../postgresql/provenancedb_with_path.py           | 115 +++------
 swh/provenance/provenance.py                       | 220 +++++++++-------
 swh/provenance/tests/conftest.py                   |  38 ++-
 swh/provenance/tests/data/README.md                | 138 ++++++++++
 swh/provenance/tests/data/cmdbts2.msgpack          | Bin 0 -> 17734 bytes
 swh/provenance/tests/data/cmdbts2_repo.yaml        |  80 ++++++
 .../tests/data/generate_storage_from_git.py        | 115 +++++++++
 swh/provenance/tests/data/out-of-order.msgpack     | Bin 0 -> 5254 bytes
 swh/provenance/tests/data/out-of-order_repo.yaml   |  29 +++
 .../tests/data/synthetic_cmdbts2_lower_1.txt       |  91 +++++++
 .../tests/data/synthetic_cmdbts2_lower_2.txt       |  91 +++++++
 .../tests/data/synthetic_cmdbts2_upper_1.txt       |  91 +++++++
 .../tests/data/synthetic_cmdbts2_upper_2.txt       |  91 +++++++
 swh/provenance/tests/data/synthetic_lower_1.txt    |  91 -------
 swh/provenance/tests/data/synthetic_lower_2.txt    |  91 -------
 .../tests/data/synthetic_out-of-order_lower_1.txt  |  31 +++
 swh/provenance/tests/data/synthetic_upper_1.txt    |  92 -------
 swh/provenance/tests/data/synthetic_upper_2.txt    |  91 -------
 swh/provenance/tests/test_provenance_db.py         | 281 ++++++++++++---------
 swh/provenance/tests/test_provenance_db_storage.py |   2 +-
 swh/provenance/tests/test_provenance_heuristics.py | 247 ++++++++++++++++++
 24 files changed, 1387 insertions(+), 682 deletions(-)
 create mode 100644 swh/provenance/tests/data/README.md
 create mode 100644 swh/provenance/tests/data/cmdbts2.msgpack
 create mode 100644 swh/provenance/tests/data/cmdbts2_repo.yaml
 create mode 100644 swh/provenance/tests/data/generate_storage_from_git.py
 create mode 100644 swh/provenance/tests/data/out-of-order.msgpack
 create mode 100644 swh/provenance/tests/data/out-of-order_repo.yaml
 create mode 100644 swh/provenance/tests/data/synthetic_cmdbts2_lower_1.txt
 create mode 100644 swh/provenance/tests/data/synthetic_cmdbts2_lower_2.txt
 create mode 100644 swh/provenance/tests/data/synthetic_cmdbts2_upper_1.txt
 create mode 100644 swh/provenance/tests/data/synthetic_cmdbts2_upper_2.txt
 delete mode 100644 swh/provenance/tests/data/synthetic_lower_1.txt
 delete mode 100644 swh/provenance/tests/data/synthetic_lower_2.txt
 create mode 100644 swh/provenance/tests/data/synthetic_out-of-order_lower_1.txt
 delete mode 100644 swh/provenance/tests/data/synthetic_upper_1.txt
 delete mode 100644 swh/provenance/tests/data/synthetic_upper_2.txt
 create mode 100644 swh/provenance/tests/test_provenance_heuristics.py
Changes applied before test
commit 020c33d85d4de7202eb8af5d5dea0c6a74305434
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jun 2 12:10:50 2021 +0200

    Add a test_provenance_heuristics_content_find_all() test
    
    test that ProvenanceDB.find_all() behaves as expected for all test
    datasets.

commit 296229e3ffddb05c516a4208c25a6155f24314b4
Author: David Douard <david.douard@sdfa3.org>
Date:   Tue Jun 1 16:34:57 2021 +0200

    Add a simple out-of-order dataset

commit 46c7c9df7beaae63f4dc1089498c64c5658d3bf5
Author: David Douard <david.douard@sdfa3.org>
Date:   Tue Jun 1 11:47:09 2021 +0200

    Remove test_provenance_heuristics_CMDBTS test
    
    since it's redundant with the cmdbts2 test, now generated from a simple
    yaml file rather than depending on the original CMDBTS git repo on
    github.
    
    The CMDBTS dataset (CMDBTS.msgpack) is kept for now since it's still
    used for other tests (e.g. test_provenance_db).

commit b3279560fea0c3a84002516cd25d3c3ce86491c6
Author: David Douard <david.douard@sdfa3.org>
Date:   Mon May 31 16:51:01 2021 +0200

    Add a new (git) dataset generation scaffolding for tests
    
    and use it to the generate a 'cmdbts2' test case strictly equivalent
    to the CMDBTS repo.
    
    See the swh/provenance/tests/data/README.md file for more details.
    
    Note: this aims at making easy to write more test cases than depending
    on the CMDBTS git repo on github. For example, a new test case should
    come soon for situations like 'out-of-order' revisions.

commit 56f0ae8e12990006b1faec62bb8f61b9eed84955
Author: David Douard <david.douard@sdfa3.org>
Date:   Mon May 31 16:45:48 2021 +0200

    Remove test_provenance_heuristics from tests from ArchvieStorage tests
    
    because it's not that meaningful.

commit de30f332f219e4edb299bb50a0b808a779c57d85
Author: David Douard <david.douard@sdfa3.org>
Date:   Tue May 25 14:31:51 2021 +0200

    Simplify DB queries in ProvenanceWithPathDB.content_find_(first|all)
    
    the queries should be exactly the same as before (query plans are the
    same); just written (hopefully) in a bit more readable manner.

commit ee8e4b0b7ce6a85eac0665a916a37b2d63e3bb4d
Author: David Douard <david.douard@sdfa3.org>
Date:   Tue May 25 14:59:16 2021 +0200

    Add a test for content_find_all()

commit 94598b3ce8c49eb6dfe5308b47b74271a7f9d625
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed May 12 09:47:03 2021 +0200

    Refactor the isochrone graph computation
    
    attempt to simplify a bit this part of the code:
    
    - IsochroneNode are now only used for directories
    - FileEntry are used directly from IsochroneNode.entry.files (no need
      for creating new FileEntry instances), so
    - IsochroneNode.children only stores IsochroneNode (thus DirectoryEntry)
      objects,
    - rename IsochroneNode.date as 'dbdate' and clarify its semantics,
    - attempt to document (comments) a bit more the algorithm and semantics
      of several attributes/variables used in there.

commit 9d110b93e9c39d65bf2986b148c4bf3467b0efa3
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed May 19 16:16:53 2021 +0200

    Replace 'DirectoryEntry.ls()' method by 'files' and 'dirs' properties
    
    and make the retrieval of children from the archive explicit in a
    dedicated retrieve_children() method.

commit fcfbb250e688a4ade6849522714832ec49238a8d
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed May 19 16:14:41 2021 +0200

    Add __str__ methods to RevisionEntry, DirectoryEntry and FileEntry
    
    to ease logging and debugging.

commit 1f823ac01491ee0f27eac685d32322f8558c26bc
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed May 12 12:44:07 2021 +0200

    Improve a bit the code of ProvenanceDBBase

commit cb623cb0e7dd9a2a568b6d2645e89c4d86ba0a66
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed May 12 10:30:23 2021 +0200

    Add a test for the build_isochrone_graph() function
    
    this test is far from ideal, since it's mostly the record of what happen
    during a "known good" session of revision insertions, but at least it
    should allow to refactor code related to the isochrone graph computation
    with a bit more confidence...

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/64/ for more details.

vlorentz added inline comments.
swh/provenance/tests/test_provenance_heuristics.py
245–246

why the length comparison?

If they don't have the same length, pytest will show a much better error on set(db_occurrences) == set(expected) by listing what items are missing.

Build has FAILED

Patch application report for D5812 (id=20789)

Could not rebase; Attempt merge onto 08344d3f76...

Updating 08344d3..a4da950
Fast-forward
 requirements-test.txt                              |   1 +
 swh/provenance/tests/conftest.py                   |  38 +++-
 swh/provenance/tests/data/README.md                | 138 ++++++++++++
 swh/provenance/tests/data/cmdbts2.msgpack          | Bin 0 -> 17734 bytes
 swh/provenance/tests/data/cmdbts2_repo.yaml        |  80 +++++++
 .../tests/data/generate_storage_from_git.py        | 115 ++++++++++
 swh/provenance/tests/data/out-of-order.msgpack     | Bin 0 -> 6653 bytes
 swh/provenance/tests/data/out-of-order_repo.yaml   |  35 +++
 .../tests/data/synthetic_cmdbts2_lower_1.txt       |  91 ++++++++
 .../tests/data/synthetic_cmdbts2_lower_2.txt       |  91 ++++++++
 .../tests/data/synthetic_cmdbts2_upper_1.txt       |  91 ++++++++
 .../tests/data/synthetic_cmdbts2_upper_2.txt       |  91 ++++++++
 swh/provenance/tests/data/synthetic_lower_1.txt    |  91 --------
 swh/provenance/tests/data/synthetic_lower_2.txt    |  91 --------
 .../tests/data/synthetic_out-of-order_lower_1.txt  |  38 ++++
 swh/provenance/tests/data/synthetic_upper_1.txt    |  92 --------
 swh/provenance/tests/data/synthetic_upper_2.txt    |  91 --------
 swh/provenance/tests/test_provenance_db.py         | 132 -----------
 swh/provenance/tests/test_provenance_db_storage.py |   2 +-
 swh/provenance/tests/test_provenance_heuristics.py | 247 +++++++++++++++++++++
 20 files changed, 1046 insertions(+), 509 deletions(-)
 create mode 100644 swh/provenance/tests/data/README.md
 create mode 100644 swh/provenance/tests/data/cmdbts2.msgpack
 create mode 100644 swh/provenance/tests/data/cmdbts2_repo.yaml
 create mode 100644 swh/provenance/tests/data/generate_storage_from_git.py
 create mode 100644 swh/provenance/tests/data/out-of-order.msgpack
 create mode 100644 swh/provenance/tests/data/out-of-order_repo.yaml
 create mode 100644 swh/provenance/tests/data/synthetic_cmdbts2_lower_1.txt
 create mode 100644 swh/provenance/tests/data/synthetic_cmdbts2_lower_2.txt
 create mode 100644 swh/provenance/tests/data/synthetic_cmdbts2_upper_1.txt
 create mode 100644 swh/provenance/tests/data/synthetic_cmdbts2_upper_2.txt
 delete mode 100644 swh/provenance/tests/data/synthetic_lower_1.txt
 delete mode 100644 swh/provenance/tests/data/synthetic_lower_2.txt
 create mode 100644 swh/provenance/tests/data/synthetic_out-of-order_lower_1.txt
 delete mode 100644 swh/provenance/tests/data/synthetic_upper_1.txt
 delete mode 100644 swh/provenance/tests/data/synthetic_upper_2.txt
 create mode 100644 swh/provenance/tests/test_provenance_heuristics.py
Changes applied before test
commit a4da95075032acf4f88fac738b3ff5b46ceb94c5
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jun 2 12:10:50 2021 +0200

    Add a test_provenance_heuristics_content_find_all() test
    
    test that ProvenanceDB.find_all() behaves as expected for all test
    datasets.

commit 71d39de612ef5b156887dfca9bf491649e17bdde
Author: David Douard <david.douard@sdfa3.org>
Date:   Tue Jun 1 16:34:57 2021 +0200

    Add a simple out-of-order dataset

commit 9854a75c8f5426836c561bd9c1b9bad7c85494e0
Author: David Douard <david.douard@sdfa3.org>
Date:   Tue Jun 1 11:47:09 2021 +0200

    Remove test_provenance_heuristics_CMDBTS test
    
    since it's redundant with the cmdbts2 test, now generated from a simple
    yaml file rather than depending on the original CMDBTS git repo on
    github.
    
    The CMDBTS dataset (CMDBTS.msgpack) is kept for now since it's still
    used for other tests (e.g. test_provenance_db).

commit 4f7b0eadd10c55318f64688abfe391ead4bcc3af
Author: David Douard <david.douard@sdfa3.org>
Date:   Mon May 31 16:51:01 2021 +0200

    Add a new (git) dataset generation scaffolding for tests
    
    and use it to the generate a 'cmdbts2' test case strictly equivalent
    to the CMDBTS repo.
    
    See the swh/provenance/tests/data/README.md file for more details.
    
    Note: this aims at making easy to write more test cases than depending
    on the CMDBTS git repo on github. For example, a new test case should
    come soon for situations like 'out-of-order' revisions.

commit fd373add1762c515070d39dc1cc1b58c09d3e8e4
Author: David Douard <david.douard@sdfa3.org>
Date:   Mon May 31 16:45:48 2021 +0200

    Remove test_provenance_heuristics from tests from ArchvieStorage tests
    
    because it's not that meaningful.

Link to build: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/71/
See console output for more information: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/71/console

swh/provenance/tests/test_provenance_heuristics.py
245–246

we compare the lenth of lists, then compare the sets... could compare the sorted lists also. oh well...

Build is green

Patch application report for D5812 (id=20798)

Could not rebase; Attempt merge onto 08344d3f76...

Updating 08344d3..a9d5543
Fast-forward
 requirements-test.txt                              |   1 +
 swh/provenance/tests/conftest.py                   |  38 +++-
 swh/provenance/tests/data/README.md                | 138 ++++++++++++
 swh/provenance/tests/data/cmdbts2.msgpack          | Bin 0 -> 17734 bytes
 swh/provenance/tests/data/cmdbts2_repo.yaml        |  80 +++++++
 .../tests/data/generate_storage_from_git.py        | 115 ++++++++++
 swh/provenance/tests/data/out-of-order.msgpack     | Bin 0 -> 6653 bytes
 swh/provenance/tests/data/out-of-order_repo.yaml   |  35 +++
 .../tests/data/synthetic_cmdbts2_lower_1.txt       |  91 ++++++++
 .../tests/data/synthetic_cmdbts2_lower_2.txt       |  91 ++++++++
 .../tests/data/synthetic_cmdbts2_upper_1.txt       |  91 ++++++++
 .../tests/data/synthetic_cmdbts2_upper_2.txt       |  91 ++++++++
 swh/provenance/tests/data/synthetic_lower_1.txt    |  91 --------
 swh/provenance/tests/data/synthetic_lower_2.txt    |  91 --------
 .../tests/data/synthetic_out-of-order_lower_1.txt  |  38 ++++
 swh/provenance/tests/data/synthetic_upper_1.txt    |  92 --------
 swh/provenance/tests/data/synthetic_upper_2.txt    |  91 --------
 swh/provenance/tests/test_provenance_db.py         | 132 -----------
 swh/provenance/tests/test_provenance_db_storage.py |   2 +-
 swh/provenance/tests/test_provenance_heuristics.py | 247 +++++++++++++++++++++
 20 files changed, 1046 insertions(+), 509 deletions(-)
 create mode 100644 swh/provenance/tests/data/README.md
 create mode 100644 swh/provenance/tests/data/cmdbts2.msgpack
 create mode 100644 swh/provenance/tests/data/cmdbts2_repo.yaml
 create mode 100644 swh/provenance/tests/data/generate_storage_from_git.py
 create mode 100644 swh/provenance/tests/data/out-of-order.msgpack
 create mode 100644 swh/provenance/tests/data/out-of-order_repo.yaml
 create mode 100644 swh/provenance/tests/data/synthetic_cmdbts2_lower_1.txt
 create mode 100644 swh/provenance/tests/data/synthetic_cmdbts2_lower_2.txt
 create mode 100644 swh/provenance/tests/data/synthetic_cmdbts2_upper_1.txt
 create mode 100644 swh/provenance/tests/data/synthetic_cmdbts2_upper_2.txt
 delete mode 100644 swh/provenance/tests/data/synthetic_lower_1.txt
 delete mode 100644 swh/provenance/tests/data/synthetic_lower_2.txt
 create mode 100644 swh/provenance/tests/data/synthetic_out-of-order_lower_1.txt
 delete mode 100644 swh/provenance/tests/data/synthetic_upper_1.txt
 delete mode 100644 swh/provenance/tests/data/synthetic_upper_2.txt
 create mode 100644 swh/provenance/tests/test_provenance_heuristics.py
Changes applied before test
commit a9d5543d6701f2cd79800611d2e4a79b3a0b3686
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jun 2 12:10:50 2021 +0200

    Add a test_provenance_heuristics_content_find_all() test
    
    test that ProvenanceDB.find_all() behaves as expected for all test
    datasets.

commit 242726f6980b6c98c7cd9942fd0b1e1ee21e034f
Author: David Douard <david.douard@sdfa3.org>
Date:   Tue Jun 1 16:34:57 2021 +0200

    Add a simple out-of-order dataset

commit 9854a75c8f5426836c561bd9c1b9bad7c85494e0
Author: David Douard <david.douard@sdfa3.org>
Date:   Tue Jun 1 11:47:09 2021 +0200

    Remove test_provenance_heuristics_CMDBTS test
    
    since it's redundant with the cmdbts2 test, now generated from a simple
    yaml file rather than depending on the original CMDBTS git repo on
    github.
    
    The CMDBTS dataset (CMDBTS.msgpack) is kept for now since it's still
    used for other tests (e.g. test_provenance_db).

commit 4f7b0eadd10c55318f64688abfe391ead4bcc3af
Author: David Douard <david.douard@sdfa3.org>
Date:   Mon May 31 16:51:01 2021 +0200

    Add a new (git) dataset generation scaffolding for tests
    
    and use it to the generate a 'cmdbts2' test case strictly equivalent
    to the CMDBTS repo.
    
    See the swh/provenance/tests/data/README.md file for more details.
    
    Note: this aims at making easy to write more test cases than depending
    on the CMDBTS git repo on github. For example, a new test case should
    come soon for situations like 'out-of-order' revisions.

commit fd373add1762c515070d39dc1cc1b58c09d3e8e4
Author: David Douard <david.douard@sdfa3.org>
Date:   Mon May 31 16:45:48 2021 +0200

    Remove test_provenance_heuristics from tests from ArchvieStorage tests
    
    because it's not that meaningful.

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/75/ for more details.

Build is green

Patch application report for D5812 (id=20804)

Could not rebase; Attempt merge onto 08344d3f76...

Updating 08344d3..b3436ed
Fast-forward
 requirements-test.txt                              |   1 +
 swh/provenance/tests/conftest.py                   |  40 +++-
 swh/provenance/tests/data/README.md                | 138 ++++++++++++
 swh/provenance/tests/data/cmdbts2.msgpack          | Bin 0 -> 17734 bytes
 swh/provenance/tests/data/cmdbts2_repo.yaml        |  80 +++++++
 .../tests/data/generate_storage_from_git.py        | 115 ++++++++++
 swh/provenance/tests/data/out-of-order.msgpack     | Bin 0 -> 6653 bytes
 swh/provenance/tests/data/out-of-order_repo.yaml   |  35 +++
 .../tests/data/synthetic_cmdbts2_lower_1.txt       |  91 ++++++++
 .../tests/data/synthetic_cmdbts2_lower_2.txt       |  91 ++++++++
 .../tests/data/synthetic_cmdbts2_upper_1.txt       |  91 ++++++++
 .../tests/data/synthetic_cmdbts2_upper_2.txt       |  91 ++++++++
 swh/provenance/tests/data/synthetic_lower_1.txt    |  91 --------
 swh/provenance/tests/data/synthetic_lower_2.txt    |  91 --------
 .../tests/data/synthetic_out-of-order_lower_1.txt  |  42 ++++
 swh/provenance/tests/data/synthetic_upper_1.txt    |  92 --------
 swh/provenance/tests/data/synthetic_upper_2.txt    |  91 --------
 swh/provenance/tests/test_provenance_db.py         | 132 -----------
 swh/provenance/tests/test_provenance_db_storage.py |   2 +-
 swh/provenance/tests/test_provenance_heuristics.py | 247 +++++++++++++++++++++
 20 files changed, 1051 insertions(+), 510 deletions(-)
 create mode 100644 swh/provenance/tests/data/README.md
 create mode 100644 swh/provenance/tests/data/cmdbts2.msgpack
 create mode 100644 swh/provenance/tests/data/cmdbts2_repo.yaml
 create mode 100644 swh/provenance/tests/data/generate_storage_from_git.py
 create mode 100644 swh/provenance/tests/data/out-of-order.msgpack
 create mode 100644 swh/provenance/tests/data/out-of-order_repo.yaml
 create mode 100644 swh/provenance/tests/data/synthetic_cmdbts2_lower_1.txt
 create mode 100644 swh/provenance/tests/data/synthetic_cmdbts2_lower_2.txt
 create mode 100644 swh/provenance/tests/data/synthetic_cmdbts2_upper_1.txt
 create mode 100644 swh/provenance/tests/data/synthetic_cmdbts2_upper_2.txt
 delete mode 100644 swh/provenance/tests/data/synthetic_lower_1.txt
 delete mode 100644 swh/provenance/tests/data/synthetic_lower_2.txt
 create mode 100644 swh/provenance/tests/data/synthetic_out-of-order_lower_1.txt
 delete mode 100644 swh/provenance/tests/data/synthetic_upper_1.txt
 delete mode 100644 swh/provenance/tests/data/synthetic_upper_2.txt
 create mode 100644 swh/provenance/tests/test_provenance_heuristics.py
Changes applied before test
commit b3436ed828b6849f03bb0a363177f1d3a1643ed1
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jun 2 12:10:50 2021 +0200

    Add a test_provenance_heuristics_content_find_all() test
    
    test that ProvenanceDB.find_all() behaves as expected for all test
    datasets.

commit 70924164c5b251f6e8b3e23f691bd77d723b843e
Author: David Douard <david.douard@sdfa3.org>
Date:   Tue Jun 1 16:34:57 2021 +0200

    Add a simple out-of-order dataset

commit 19ef0ba9f5c86d26b595aaa5dc64390994551b64
Author: David Douard <david.douard@sdfa3.org>
Date:   Tue Jun 1 11:47:09 2021 +0200

    Remove test_provenance_heuristics_CMDBTS test
    
    since it's redundant with the cmdbts2 test, now generated from a simple
    yaml file rather than depending on the original CMDBTS git repo on
    github.
    
    The CMDBTS dataset (CMDBTS.msgpack) is kept for now since it's still
    used for other tests (e.g. test_provenance_db).

commit f534e558645ecc9384dfb1781e94266feac683f1
Author: David Douard <david.douard@sdfa3.org>
Date:   Mon May 31 16:51:01 2021 +0200

    Add a new (git) dataset generation scaffolding for tests
    
    and use it to the generate a 'cmdbts2' test case strictly equivalent
    to the CMDBTS repo.
    
    See the swh/provenance/tests/data/README.md file for more details.
    
    Note: this aims at making easy to write more test cases than depending
    on the CMDBTS git repo on github. For example, a new test case should
    come soon for situations like 'out-of-order' revisions.

commit fd373add1762c515070d39dc1cc1b58c09d3e8e4
Author: David Douard <david.douard@sdfa3.org>
Date:   Mon May 31 16:45:48 2021 +0200

    Remove test_provenance_heuristics from tests from ArchvieStorage tests
    
    because it's not that meaningful.

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/79/ for more details.

Build is green

Patch application report for D5812 (id=20808)

Could not rebase; Attempt merge onto 08344d3f76...

Updating 08344d3..3c9f7bd
Fast-forward
 requirements-test.txt                              |   1 +
 swh/provenance/tests/conftest.py                   |  40 +++-
 swh/provenance/tests/data/README.md                | 138 ++++++++++++
 swh/provenance/tests/data/cmdbts2.msgpack          | Bin 0 -> 17734 bytes
 swh/provenance/tests/data/cmdbts2_repo.yaml        |  80 +++++++
 .../tests/data/generate_storage_from_git.py        | 115 ++++++++++
 swh/provenance/tests/data/out-of-order.msgpack     | Bin 0 -> 6653 bytes
 swh/provenance/tests/data/out-of-order_repo.yaml   |  35 +++
 .../tests/data/synthetic_cmdbts2_lower_1.txt       |  91 ++++++++
 .../tests/data/synthetic_cmdbts2_lower_2.txt       |  91 ++++++++
 .../tests/data/synthetic_cmdbts2_upper_1.txt       |  91 ++++++++
 .../tests/data/synthetic_cmdbts2_upper_2.txt       |  91 ++++++++
 swh/provenance/tests/data/synthetic_lower_1.txt    |  91 --------
 swh/provenance/tests/data/synthetic_lower_2.txt    |  91 --------
 .../tests/data/synthetic_out-of-order_lower_1.txt  |  42 ++++
 swh/provenance/tests/data/synthetic_upper_1.txt    |  92 --------
 swh/provenance/tests/data/synthetic_upper_2.txt    |  91 --------
 swh/provenance/tests/test_provenance_db.py         | 132 -----------
 swh/provenance/tests/test_provenance_db_storage.py |   2 +-
 swh/provenance/tests/test_provenance_heuristics.py | 247 +++++++++++++++++++++
 20 files changed, 1051 insertions(+), 510 deletions(-)
 create mode 100644 swh/provenance/tests/data/README.md
 create mode 100644 swh/provenance/tests/data/cmdbts2.msgpack
 create mode 100644 swh/provenance/tests/data/cmdbts2_repo.yaml
 create mode 100644 swh/provenance/tests/data/generate_storage_from_git.py
 create mode 100644 swh/provenance/tests/data/out-of-order.msgpack
 create mode 100644 swh/provenance/tests/data/out-of-order_repo.yaml
 create mode 100644 swh/provenance/tests/data/synthetic_cmdbts2_lower_1.txt
 create mode 100644 swh/provenance/tests/data/synthetic_cmdbts2_lower_2.txt
 create mode 100644 swh/provenance/tests/data/synthetic_cmdbts2_upper_1.txt
 create mode 100644 swh/provenance/tests/data/synthetic_cmdbts2_upper_2.txt
 delete mode 100644 swh/provenance/tests/data/synthetic_lower_1.txt
 delete mode 100644 swh/provenance/tests/data/synthetic_lower_2.txt
 create mode 100644 swh/provenance/tests/data/synthetic_out-of-order_lower_1.txt
 delete mode 100644 swh/provenance/tests/data/synthetic_upper_1.txt
 delete mode 100644 swh/provenance/tests/data/synthetic_upper_2.txt
 create mode 100644 swh/provenance/tests/test_provenance_heuristics.py
Changes applied before test
commit 3c9f7bd77a2babc5ca4509878fa7e1f1f9136591
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jun 2 12:10:50 2021 +0200

    Add a test_provenance_heuristics_content_find_all() test
    
    test that ProvenanceDB.find_all() behaves as expected for all test
    datasets.

commit f3cd239bf8c3241b297c6481beca266bfd47eb25
Author: David Douard <david.douard@sdfa3.org>
Date:   Tue Jun 1 16:34:57 2021 +0200

    Add a simple out-of-order dataset

commit 5f9e5b53a5b0547cfdbe1676e2b09648f4f359f1
Author: David Douard <david.douard@sdfa3.org>
Date:   Tue Jun 1 11:47:09 2021 +0200

    Remove test_provenance_heuristics_CMDBTS test
    
    since it's redundant with the cmdbts2 test, now generated from a simple
    yaml file rather than depending on the original CMDBTS git repo on
    github.
    
    The CMDBTS dataset (CMDBTS.msgpack) is kept for now since it's still
    used for other tests (e.g. test_provenance_db).

commit 9fe096b3905495bff534649f2e5e0ecb8802217d
Author: David Douard <david.douard@sdfa3.org>
Date:   Mon May 31 16:51:01 2021 +0200

    Add a new (git) dataset generation scaffolding for tests
    
    and use it to the generate a 'cmdbts2' test case strictly equivalent
    to the CMDBTS repo.
    
    See the swh/provenance/tests/data/README.md file for more details.
    
    Note: this aims at making easy to write more test cases than depending
    on the CMDBTS git repo on github. For example, a new test case should
    come soon for situations like 'out-of-order' revisions.

commit fd373add1762c515070d39dc1cc1b58c09d3e8e4
Author: David Douard <david.douard@sdfa3.org>
Date:   Mon May 31 16:45:48 2021 +0200

    Remove test_provenance_heuristics from tests from ArchvieStorage tests
    
    because it's not that meaningful.

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/83/ for more details.

I believe this can be pushed safely. The issue about invalidating frontiers and batch revision processing should be treated separately.

swh/provenance/tests/test_provenance_heuristics.py
213–214

good point, invalidating a frontier should not prevent it from being inserted in the db. This was originally meant to only affect the current revision being processed, hence there was no need to update the frontier's date (unless the current revision decided to place it in the same node again). However, now that we allow to processes a batch of several revisions per commit this should be carefully revisited.

245–246

comparing sets is not enough because we lose track of repetitions, the proper comparison would be with multi-sets but I'm not sure Python supports them. List length + set equality is a good approximation to multi-set comparison, although not exactly the same

This revision is now accepted and ready to land.Jun 4 2021, 3:27 PM