# Problem
These 2 packages are, for most of their content, intricated and very tighty coupled. Since swh-journal depends on swh-storage, it's very common that a fix or modification in swh-storage requires an update in swh-journal. But swh-storage itself depends on swh-journal (for the majority of the tests). So any modification of the journal may require a fix in storage. This interdependency make their evolution very difficult to manage, and shows the separation between the 2 projects is not located properly.
So it would make sense to (re)integrate (at least part of) swh-journal in swh-storage and break this circular dependency loop.
# Current situation
## swh-journal
Currently, swh-journal is a rather small package (~3000 python sloc) and consist in several parts:
**1/ writer**
the **writer** (producer) part, in swh/journal/writer. This component is used from the storage exclusively to serialize any modification recorded in the storage as a message in the journal. 2 implementations of this writer component are provided: a kafka-based one (the "true" journal, used in production) and an in-memory version, for testing purpose. In fact, the JournalWriter API is very simple and consists in only a single method (plus a variant of this method):
- `write_addition(object_type, object)` where object is (now) expected to be a model entity.
- `writes_additions([...])` for a list of objects.
The model object serialization to produce messages sent to the journal is in both cases specific to the kafka backend (the in-memory backend use the same serialization functions as the kafka one).
So:
- the journal writer part is very basic and does not depend on the storage,
- the object serialization part is specific to the journal backend used (kafka) and does not depend on the storage, but only on the model.
**2/ backfiller**
The backfiller is a component very specific to the database-based storage aiming a filling the journal from scratch from an existing (database) storage.
This has nothing to do in the swh-journal package and should be moved in swh-storage.
**3/ client**
The journal client part consist in a class that allows to consume messages from kafka. There is not 'in-memory' implementation available for this component.
There is currently a limited list of accepted object types (which match what the storage can emit) but this constraint should be moved out of the this module. This later neither depends or need the storage, not even the model. The handling of incoming messages being the responsibility of the JournalClient user, via a callback.
**4/ replayer**
This component uses the JournalClient for inserting model objects in a storage. In fact, there are 2 replayers in this module, the graph-replayer responsible for filling a strorage from a kafka journal, and the content-replayer responsible for filling and objstorage from the a kafka journal and a source objstorage.
The graph-replayer obviously depends on the storage module, and should also be moved there. It's a storage-specific piece of code, not a journal specific one.
The content-replayer should be moved in the objstorage for the same reasons.
## swh-storage
The code of the storage depends on the JournalWriter both within the code of the storage itself, and because it's used for tests (the in-memory journal writer).
So the swh-storage depends on the JournalWriter and nothing else.
# Proposal
- move the backfiller in swh-storage
- move the graph-replayer in swh-storage
- move the content-replayer in swh-objstorage
- modify slightly the JournalClient to make it completely storage-agnostic.
Doing we should have:
- swh-journal depends on swh-model
- swh-storage depends on swh-model,swh-journal
- swh-objstorage depends on swh-journal
Some attention may be needed to ensure continuity of cli tools.