Page MenuHomeSoftware Heritage

Add a 'swh provenance replay' cli command
ClosedPublic

Authored by douardda on Oct 13 2022, 4:35 PM.

Diff Detail

Repository
rDPROV Provenance database
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build was aborted

Patch application report for D8678 (id=31340)

Rebasing onto 0850a3943d...

Current branch diff-target is up to date.
Changes applied before test
commit 022b6f76614e2eb642156e1116adf37d3b07dff8
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Oct 13 16:22:47 2022 +0200

    Add a 'swh provenance replay' cli command

Link to build: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/695/
See console output for more information: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/695/console

Harbormaster returned this revision to the author for changes because remote builds failed.Oct 13 2022, 4:53 PM
Harbormaster failed remote builds in B32288: Diff 31340!

Build is green

Patch application report for D8678 (id=31340)

Rebasing onto 0850a3943d...

Current branch diff-target is up to date.
Changes applied before test
commit 022b6f76614e2eb642156e1116adf37d3b07dff8
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Oct 13 16:22:47 2022 +0200

    Add a 'swh provenance replay' cli command

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/696/ for more details.

vlorentz added a subscriber: vlorentz.

Bikeshedding: it should be called "journal-client" rather than "replay" for consistency with swh-indexer and swh-search. (swh-storage only calls it "replay" because it's used to copy from another instance of the same code so it "replays" the same API calls; but here it may be the first "play")

This revision is now accepted and ready to land.Oct 17 2022, 1:13 PM

Bikeshedding: it should be called "journal-client" rather than "replay" for consistency with swh-indexer and swh-search. (swh-storage only calls it "replay" because it's used to copy from another instance of the same code so it "replays" the same API calls; but here it may be the first "play")

I'm not sure I follow you there; this really is a replayer feature: it aims at replicating a provenance DB via a kafka journal.
We already have a journal client in provenance consuming the main archive revision and origin-visit-status topics. The cli are swh provenance revision from-journal and swh provenance origin from-journal (aka execute the {origin,revision} layer reading from the journal; there are from-csv versions of these commands as well).

Bikeshedding: it should be called "journal-client" rather than "replay" for consistency with swh-indexer and swh-search. (swh-storage only calls it "replay" because it's used to copy from another instance of the same code so it "replays" the same API calls; but here it may be the first "play")

I'm not sure I follow you there; this really is a replayer feature: it aims at replicating a provenance DB via a kafka journal.
We already have a journal client in provenance consuming the main archive revision and origin-visit-status topics. The cli are swh provenance revision from-journal and swh provenance origin from-journal (aka execute the {origin,revision} layer reading from the journal; there are from-csv versions of these commands as well).

To be more precise, this is the cli for D8668, which is the "dual" of D8657