Page MenuHomeSoftware Heritage

Allow to add extra origins and snapshots in generated test storages
ClosedPublic

Authored by douardda on Jun 18 2021, 12:26 PM.

Details

Summary

Use an extra (yaml) file to describe the additional visits (and
snapshots) to insert in the storage before dumping it in the masgpack
file.

Depends on D5892.

Diff Detail

Repository
rDPROV Provenance database
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D5894 (id=21125)

Could not rebase; Attempt merge onto 8ff1ab5860...

Updating 8ff1ab5..e286a88
Fast-forward
 mypy.ini                                           |   3 +
 requirements-test.txt                              |   1 +
 swh/provenance/tests/conftest.py                   |  38 ++----
 swh/provenance/tests/data/README.md                |  28 ++++
 swh/provenance/tests/data/cmdbts2.msgpack          | Bin 17734 -> 14470 bytes
 swh/provenance/tests/data/generate_repo.py         |  16 ++-
 .../tests/data/generate_storage_from_git.py        | 141 ++++++++++-----------
 swh/provenance/tests/data/out-of-order.msgpack     | Bin 6653 -> 6325 bytes
 .../tests/data/repo_with_merges-visits-01.yaml     |  34 +++++
 swh/provenance/tests/data/repo_with_merges.yaml    |  73 +++++++++++
 10 files changed, 236 insertions(+), 98 deletions(-)
 create mode 100644 swh/provenance/tests/data/repo_with_merges-visits-01.yaml
 create mode 100644 swh/provenance/tests/data/repo_with_merges.yaml
Changes applied before test
commit e286a88fdfbf67f040f7694e4e26d2dc902c3e97
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jun 17 20:44:54 2021 +0200

    Allow to add extra origins and snapshots in generated test storages
    
    Use an extra (yaml) file to describe the additional visits (and
    snapshots) to insert in the storage before dumping it in the masgpack
    file.

commit 6734fd36b87233aa7520c90e7b6ad999d75103cf
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jun 17 14:52:54 2021 +0200

    Add support for (topological) branches and merges in generate_repo.py
    
    include an example input file (currently not used in actual tests).

commit 7886bf494ab8dd0405ad391a0ca57eed1ee7149c
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jun 17 19:53:06 2021 +0200

    Refactor the generate_storage_from_git dataset creation tool
    
    use the newly introduced swh.journal StreamJournalWriter to generate the
    msgpack-file with the content of the storage serialized, instead of
    reimplementing a similar logic by hand.
    
    Doing so, the msgpack file format is a little bit different, so current
    datasets need to be regenerated.
    
    From the 'consumer' side (the fill_storage fixture), simply use
    swh.storage's process_replay_objects (once again, instead of reinventing
    the wheel).
    
    This is needed because we will need to handle much more model object types
    for testing the origin-revision layer.

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/157/ for more details.

use 'branches' instead of "revisions" as section in the yaml file

makes more sense with what the code actually does.

Build is green

Patch application report for D5894 (id=21126)

Could not rebase; Attempt merge onto 8ff1ab5860...

Updating 8ff1ab5..3cb8dad
Fast-forward
 mypy.ini                                           |   3 +
 requirements-test.txt                              |   1 +
 swh/provenance/tests/conftest.py                   |  38 ++----
 swh/provenance/tests/data/README.md                |  28 ++++
 swh/provenance/tests/data/cmdbts2.msgpack          | Bin 17734 -> 14470 bytes
 swh/provenance/tests/data/generate_repo.py         |  16 ++-
 .../tests/data/generate_storage_from_git.py        | 145 ++++++++++-----------
 swh/provenance/tests/data/out-of-order.msgpack     | Bin 6653 -> 6325 bytes
 .../tests/data/repo_with_merges-visits-01.yaml     |  34 +++++
 swh/provenance/tests/data/repo_with_merges.yaml    |  73 +++++++++++
 10 files changed, 240 insertions(+), 98 deletions(-)
 create mode 100644 swh/provenance/tests/data/repo_with_merges-visits-01.yaml
 create mode 100644 swh/provenance/tests/data/repo_with_merges.yaml
Changes applied before test
commit 3cb8dadbdc6b3211ca4fed0dedbc3c7ba632958d
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jun 17 20:44:54 2021 +0200

    Allow to add extra origins and snapshots in generated test storages
    
    Use an extra (yaml) file to describe the additional visits (and
    snapshots) to insert in the storage before dumping it in the masgpack
    file.

commit 6734fd36b87233aa7520c90e7b6ad999d75103cf
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jun 17 14:52:54 2021 +0200

    Add support for (topological) branches and merges in generate_repo.py
    
    include an example input file (currently not used in actual tests).

commit 7886bf494ab8dd0405ad391a0ca57eed1ee7149c
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jun 17 19:53:06 2021 +0200

    Refactor the generate_storage_from_git dataset creation tool
    
    use the newly introduced swh.journal StreamJournalWriter to generate the
    msgpack-file with the content of the storage serialized, instead of
    reimplementing a similar logic by hand.
    
    Doing so, the msgpack file format is a little bit different, so current
    datasets need to be regenerated.
    
    From the 'consumer' side (the fill_storage fixture), simply use
    swh.storage's process_replay_objects (once again, instead of reinventing
    the wheel).
    
    This is needed because we will need to handle much more model object types
    for testing the origin-revision layer.

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/160/ for more details.

aeviso added a subscriber: aeviso.
aeviso added inline comments.
swh/provenance/tests/data/README.md
100

visitsm -> visits

swh/provenance/tests/data/generate_storage_from_git.py
93

Not sure if I'm getting this correctly but I assume all targets for the generated snapshots are revisions, right?

This is enough for now, but we might also want (in the future) to test our ArchiveInterface.snapshot_get_heads method with snapshots having releases as targets as well

This revision is now accepted and ready to land.Jun 21 2021, 3:51 PM
swh/provenance/tests/data/generate_storage_from_git.py
93

Not sure if I'm getting this correctly but I assume all targets for the generated snapshots are revisions, right?

Yes

This is enough for now, but we might also want (in the future) to test our ArchiveInterface.snapshot_get_heads method with snapshots having releases as targets as well

Indeed, there is room for improvement.

Build is green

Patch application report for D5894 (id=21160)

Could not rebase; Attempt merge onto 8ff1ab5860...

Updating 8ff1ab5..0116452
Fast-forward
 mypy.ini                                           |   3 +
 requirements-test.txt                              |   1 +
 swh/provenance/tests/conftest.py                   |  38 ++----
 swh/provenance/tests/data/README.md                |  28 ++++
 swh/provenance/tests/data/cmdbts2.msgpack          | Bin 17734 -> 14470 bytes
 swh/provenance/tests/data/generate_repo.py         |  16 ++-
 .../tests/data/generate_storage_from_git.py        | 145 ++++++++++-----------
 swh/provenance/tests/data/out-of-order.msgpack     | Bin 6653 -> 6325 bytes
 .../tests/data/repo_with_merges-visits-01.yaml     |  34 +++++
 swh/provenance/tests/data/repo_with_merges.yaml    |  73 +++++++++++
 10 files changed, 240 insertions(+), 98 deletions(-)
 create mode 100644 swh/provenance/tests/data/repo_with_merges-visits-01.yaml
 create mode 100644 swh/provenance/tests/data/repo_with_merges.yaml
Changes applied before test
commit 011645221cf69139cd4b26fd36a6a572d9472270
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jun 17 20:44:54 2021 +0200

    Allow to add extra origins and snapshots in generated test storages
    
    Use an extra (yaml) file to describe the additional visits (and
    snapshots) to insert in the storage before dumping it in the masgpack
    file.

commit 6734fd36b87233aa7520c90e7b6ad999d75103cf
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jun 17 14:52:54 2021 +0200

    Add support for (topological) branches and merges in generate_repo.py
    
    include an example input file (currently not used in actual tests).

commit 7886bf494ab8dd0405ad391a0ca57eed1ee7149c
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jun 17 19:53:06 2021 +0200

    Refactor the generate_storage_from_git dataset creation tool
    
    use the newly introduced swh.journal StreamJournalWriter to generate the
    msgpack-file with the content of the storage serialized, instead of
    reimplementing a similar logic by hand.
    
    Doing so, the msgpack file format is a little bit different, so current
    datasets need to be regenerated.
    
    From the 'consumer' side (the fill_storage fixture), simply use
    swh.storage's process_replay_objects (once again, instead of reinventing
    the wheel).
    
    This is needed because we will need to handle much more model object types
    for testing the origin-revision layer.

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/173/ for more details.