- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Oct 18 2022
rebase
nope, does not work (as expected, actually)
Oct 17 2022
In D8678#226283, @douardda wrote:In D8678#226223, @vlorentz wrote:Bikeshedding: it should be called "journal-client" rather than "replay" for consistency with swh-indexer and swh-search. (swh-storage only calls it "replay" because it's used to copy from another instance of the same code so it "replays" the same API calls; but here it may be the first "play")
I'm not sure I follow you there; this really is a replayer feature: it aims at replicating a provenance DB via a kafka journal.
We already have a journal client in provenance consuming the main archive revision and origin-visit-status topics. The cli are swh provenance revision from-journal and swh provenance origin from-journal (aka execute the {origin,revision} layer reading from the journal; there are from-csv versions of these commands as well).
In D8678#226223, @vlorentz wrote:Bikeshedding: it should be called "journal-client" rather than "replay" for consistency with swh-indexer and swh-search. (swh-storage only calls it "replay" because it's used to copy from another instance of the same code so it "replays" the same API calls; but here it may be the first "play")
Oct 14 2022
Oct 13 2022
Also I don't understand what "this is an import from replication package" means in this context...
not sure I understand what's going on here.
lgtm (but with a couple of comments)
split in 2 (one for each rev)
move the 'not null' removal in the proper revision
Oct 12 2022
In D8658#225748, @vlorentz wrote:This seems to work:
diff --git a/swh/provenance/storage/replay.py b/swh/provenance/storage/replay.py index d42d134..63e44cb 100644 --- a/swh/provenance/storage/replay.py +++ b/swh/provenance/storage/replay.py @@ -4,7 +4,7 @@ # See top-level LICENSE file for more information import logging -from typing import Any, Callable, Dict, List, Optional +from typing import Any, Callable, Dict, List, Optional, Tuple try: from systemd.daemon import notify @@ -29,19 +29,19 @@ def cvrt_directory(msg_d): - return {msg_d["id"]: DirectoryData(**msg_d["value"])} + return (msg_d["id"], DirectoryData(**msg_d["value"])) def cvrt_revision(msg_d): - return {msg_d["id"]: RevisionData(**msg_d["value"])} + return (msg_d["id"], RevisionData(**msg_d["value"])) def cvrt_default(msg_d): - return {msg_d["id"]: msg_d["value"]} + return (msg_d["id"], msg_d["value"]) def cvrt_relation(msg_d): - return {msg_d["id"]: {RelationData(**v) for v in msg_d["value"]}} + return (msg_d["id"], {RelationData(**v) for v in msg_d["value"]}) OBJECT_CONVERTERS: Dict[str, Callable[[Dict], Dict]] = { @@ -75,7 +75,9 @@ def report_failure(self, msg: bytes, obj: Dict): def process_replay_objects( - all_objects: Dict[str, List[Dict]], *, storage: ProvenanceStorageInterface + all_objects: Dict[str, List[Tuple[bytes, Any]]], + *, + storage: ProvenanceStorageInterface, ) -> None: for object_type, objects in all_objects.items(): logger.debug("Inserting %s %s objects", len(objects), object_type) @@ -89,14 +91,16 @@ def process_replay_objects( def _insert_objects( - object_type: str, objects: List[Any], storage: ProvenanceStorageInterface + object_type: str, + objects: List[Tuple[bytes, Any]], + storage: ProvenanceStorageInterface, ) -> None: """Insert objects of type object_type in the storage.""" if object_type not in OBJECT_CONVERTERS: logger.warning("Received a series of %s, this should not happen", object_type) return - data = dict(next(iter(obj.items())) for obj in objects) + data = dict(objects) if "_in_" in object_type: storage.relation_add(relation=RelationType(object_type), data=data) else:
fix tabs in 40-funcs.sql file