Page MenuHomeSoftware Heritage

Refactor the generate_storage_from_git dataset creation tool
ClosedPublic

Authored by douardda on Jun 17 2021, 8:16 PM.

Details

Summary

use the newly introduced swh.journal StreamJournalWriter to generate the
msgpack-file with the content of the storage serialized, instead of
reimplementing a similar logic by hand.

Doing so, the msgpack file format is a little bit different, so current
datasets need to be regenerated.

From the 'consumer' side (the fill_storage fixture), simply use
swh.storage's process_replay_objects (once again, instead of reinventing
the wheel).

This is needed because we will need to handle much more model object types
for testing the origin-revision layer.

Diff Detail

Repository
rDPROV Provenance database
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build has FAILED

Patch application report for D5891 (id=21101)

Rebasing onto c9d1369ba1...

Current branch diff-target is up to date.
Changes applied before test
commit e1a003a28f2ce1319f65af1bc54dac6003cc4910
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jun 17 19:53:06 2021 +0200

    Refactor the generate_storage_from_git dataset creation tool
    
    use the newly introduced swh.journal StreamJournalWriter to generate the
    msgpack-file with the content of the storage serialized, instead of
    reimplementing a similar logic by hand.
    
    Doing so, the msgpack file format is a little bit different, so current
    datasets need to be regenerated.
    
    From the 'consumer' side (the fill_storage fixture), simply use
    swh.storage's process_replay_objects (once again, instead of reinventing
    the wheel).
    
    This is needed because we will need to handle much more model object types
    for testing the origin-revision layer.

Link to build: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/155/
See console output for more information: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/155/console

Harbormaster returned this revision to the author for changes because remote builds failed.Jun 17 2021, 8:16 PM
Harbormaster failed remote builds in B22089: Diff 21101!

Note that this diff actually depends on D5890 because generate_storage_from_git.py make use of the stream backend for the journal writer, but it's actually only used to generate datasets, so, well, tests could be green nonetheless if I remove the dependency on (unpublished) swh.journal 0.8 ...

Build is green

Patch application report for D5891 (id=21101)

Rebasing onto 8ff1ab5860...

First, rewinding head to replay your work on top of it...
Applying: Refactor the generate_storage_from_git dataset creation tool
Changes applied before test
commit 9d01b1a3a24306d28c441f3465782e363bb01a2a
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jun 17 19:53:06 2021 +0200

    Refactor the generate_storage_from_git dataset creation tool
    
    use the newly introduced swh.journal StreamJournalWriter to generate the
    msgpack-file with the content of the storage serialized, instead of
    reimplementing a similar logic by hand.
    
    Doing so, the msgpack file format is a little bit different, so current
    datasets need to be regenerated.
    
    From the 'consumer' side (the fill_storage fixture), simply use
    swh.storage's process_replay_objects (once again, instead of reinventing
    the wheel).
    
    This is needed because we will need to handle much more model object types
    for testing the origin-revision layer.

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/158/ for more details.

This revision is now accepted and ready to land.Jun 21 2021, 3:37 PM
This revision was landed with ongoing or failed builds.Jun 21 2021, 4:48 PM
This revision was automatically updated to reflect the committed changes.

Build is green

Patch application report for D5891 (id=21162)

Rebasing onto 011645221c...

First, rewinding head to replay your work on top of it...
Fast-forwarded diff-target to base-revision-175-D5891.
Changes applied before test

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/175/ for more details.