Page MenuHomeSoftware Heritage

add a content replayer service
ClosedPublic

Authored by douardda on Wed, Jun 5, 12:19 PM.

Details

Summary
  • service that listen to the 'swh.journal.objects.content' topic and replicate object blobs from a source object storage to a destination one.
  • refactor replayer tests; implement the whole kafka mocking logic in the Mocked* classes and delete the test_write_replay_same_order which was not testing anything more than what the orther test do.
  • client: simplify a bit the JournalClient class no need for those poll() and commit() proxy methods.

Depends on D1540

Diff Detail

Repository
rDJNL Journal infrastructure
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

douardda created this revision.Wed, Jun 5, 12:19 PM
vlorentz requested changes to this revision.Wed, Jun 5, 12:27 PM
vlorentz added a subscriber: vlorentz.
vlorentz added inline comments.
swh/journal/cli.py
167–179

I think these error messages should tell the exact name of the missing key, so it's easier to look for them in an example config file.

(And we should also have an example config file)

181–185

We should raise an error if there is no prefix or group id.

swh/journal/client.py
81–86 ↗(On Diff #5076)

These methods are mocked in the tests.

swh/journal/tests/test_write_replay.py
32

A writer is not a consumer. You would want to use MockConsumer instead.

44–49

not needed in a writer.

You should put that in a new class, eg. MockConsumer.

70

same

This revision now requires changes to proceed.Wed, Jun 5, 12:27 PM
douardda added inline comments.Wed, Jun 5, 2:38 PM
swh/journal/cli.py
181–185

not for the prefix, since the default value is fine (and is the one that will be used in most of the cases... TBH, I don't really see the reason this is configurable).

group_id is indeed a mandatory config option and should not have a default value...

swh/journal/client.py
81–86 ↗(On Diff #5076)

not any more :-)

swh/journal/tests/test_write_replay.py
32

as said before, this mockup is not a nice mockup of a kafka consumer nor a kafka producer, but some sort of frankenstein of both. It does the job, but is not so pretty.

44–49

this mock is both a writer and a consumer in this test. I mostly just merged several sparse functions in this class. I was not very pleased with the way these tests were written before, so I did this, but the result does not please me too much neither...

Also, your new CLI needs tests :)

swh/journal/cli.py
181–185

if prefix is None and there is no prefix in the config, then prefix will be set to None, not to DEFAULT_PREFIX. And we don't want that.

swh/journal/client.py
81–86 ↗(On Diff #5076)

I forgot to remove that comment after actually reading the diff, sorry

swh/journal/tests/test_write_replay.py
44–49

this mock is both a writer and a consumer in this test

these are two different things, so they should be different classes.

but the result does not please me too much neither...

Did you try using unittest.mock.patch?

douardda added inline comments.Wed, Jun 5, 2:59 PM
swh/journal/cli.py
181–185

indeed we do not want that. I did write the correct version of this stuff at some point but I guess I messed during a rebase or smthg

swh/journal/tests/test_write_replay.py
44–49

please note I "just" refactored existing tests...

douardda updated this revision to Diff 5101.Thu, Jun 6, 10:13 AM

should address vlortenz's comments

douardda updated this revision to Diff 5102.Thu, Jun 6, 10:15 AM

fix commit msg

vlorentz added inline comments.Thu, Jun 6, 10:40 AM
swh/journal/replay.py
47–50

inconsistent with the behavior of _insert_objects (which asserts False).

I have a preference to using the existing behavior (crashing) for both instead of just logging, but that's up to you

vlorentz added inline comments.Thu, Jun 6, 10:46 AM
swh/journal/replay.py
47–50

Thinking about it, it's probably better to log.

Could you raise the log level to warning (so it doesn't go unnoticed), and make _insert_objects do that too?

douardda added inline comments.Thu, Jun 6, 10:49 AM
swh/journal/replay.py
47–50

I think I also prefer this second solution, i will

Also, your new CLI needs tests :)

This one is actually trickier to write...

douardda updated this revision to Diff 5131.Fri, Jun 7, 1:59 PM

replay: log warnings in case of unexpected object types

as suggested by vlorentz.

douardda updated this revision to Diff 5132.Fri, Jun 7, 2:02 PM

add a TODO in test_cli about content-replay

add a TODO in test_cli about content-replay

You're cheating!

douardda updated this revision to Diff 5137.Fri, Jun 7, 2:31 PM
  • Add a content replayer service
  • cli: extract the JournalClient instanciation in a dedicated function
  • replay: make unexpectedly received object_type log a warning
vlorentz accepted this revision.Fri, Jun 7, 3:10 PM
This revision is now accepted and ready to land.Fri, Jun 7, 3:10 PM
douardda updated this revision to Diff 5144.Fri, Jun 7, 4:01 PM

small fix

douardda updated this revision to Diff 5145.Fri, Jun 7, 4:05 PM

fix the fix

This revision was landed with ongoing or failed builds.Fri, Jun 7, 4:07 PM
This revision was automatically updated to reflect the committed changes.