Page MenuHomeSoftware Heritage

Make the type of values of JournalWriter generic, so it works with types not from swh-model.
ClosedPublic

Authored by vlorentz on Sep 29 2020, 3:58 PM.

Diff Detail

Repository
rDJNL Journal infrastructure
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build has FAILED

Patch application report for D4082 (id=14401)

Could not rebase; Attempt merge onto 7815cad28a...

Updating 7815cad..aad6a07
Fast-forward
 swh/journal/pytest_plugin.py           |   4 +-
 swh/journal/serializers.py             | 109 +--------------------------------
 swh/journal/tests/journal_data.py      |   4 +-
 swh/journal/tests/test_kafka_writer.py |  17 +++--
 swh/journal/tests/test_serializers.py  |  11 +---
 swh/journal/writer/__init__.py         |  29 +++++++++
 swh/journal/writer/inmemory.py         |  20 +++---
 swh/journal/writer/kafka.py            |  52 ++++++++--------
 8 files changed, 89 insertions(+), 157 deletions(-)
Changes applied before test
commit aad6a0740f57896075137f0d758b500cc5d91f18
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 29 15:58:07 2020 +0200

    Make the type of values of JournalWriter generic, so it works with types not from swh-model.
    
    The indexer-storage will use this.

commit 549624eff1ee246f0f8552491f730123520376e8
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 29 15:06:42 2020 +0200

    Make value_sanitizer an argument of JournalWriter.
    
    Instead of being a method of KafkaJournalWriter.
    
    This will allow using JournalWriters for values that aren't model objects.

commit 1a64c96f97d8b6a56a873a269af88859083359ea
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 29 14:15:35 2020 +0200

    Remove swh.journal.serializers.object_key, use BaseModel.unique_key instead.

Link to build: https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/114/
See console output for more information: https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/114/console

vlorentz retitled this revision from [WIP] Make the type of values of JournalWriter generic, so it works with types not from swh-model. to Make the type of values of JournalWriter generic, so it works with types not from swh-model..Oct 19 2020, 2:02 PM

Build is green

Patch application report for D4082 (id=15206)

Could not rebase; Attempt merge onto abc42e2bf7...

Updating abc42e2..671c5c5
Fast-forward
 swh/journal/serializers.py             | 31 +--------------------
 swh/journal/tests/journal_data.py      |  4 +--
 swh/journal/tests/test_kafka_writer.py | 17 ++++++++----
 swh/journal/writer/__init__.py         | 29 ++++++++++++++++++++
 swh/journal/writer/inmemory.py         | 20 ++++++++------
 swh/journal/writer/kafka.py            | 49 +++++++++++++++++++---------------
 6 files changed, 83 insertions(+), 67 deletions(-)
Changes applied before test
commit 671c5c5158d8094c2861bfcba5096f7ea9d2ab30
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 29 15:58:07 2020 +0200

    Make the type of values of JournalWriter generic, so it works with types not from swh-model.
    
    The indexer-storage will use this.

commit efd94b05bebbfc082227719ac94b2223723f95ed
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 29 15:06:42 2020 +0200

    Make value_sanitizer an argument of JournalWriter.
    
    Instead of being a method of KafkaJournalWriter.
    
    This will allow using JournalWriters for values that aren't model objects.

See https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/120/ for more details.

Wouldn't it make a bit easier to name the generic version of the journal writer something like GenericKafkaJournalWriter and have KafkaJournalWriter = GenericKafkaJournalWriter[BaseModel] ? (for bw compat)

swh/journal/writer/kafka.py
67–68

The docstring should be updated to document the generic aspect of it.

Wouldn't it make a bit easier to name the generic version of the journal writer something like GenericKafkaJournalWriter and have KafkaJournalWriter = GenericKafkaJournalWriter[BaseModel] ? (for bw compat)

Why? This change won't break any code using KafkaJournalWriter

ardumont added a subscriber: ardumont.

Oh i missed that diff...

That's nice ;)

This revision is now accepted and ready to land.Oct 22 2020, 2:29 PM

Can you document the generic type argument like @douardda asked?

Wouldn't it make a bit easier to name the generic version of the journal writer something like GenericKafkaJournalWriter and have KafkaJournalWriter = GenericKafkaJournalWriter[BaseModel] ? (for bw compat)

Why? This change won't break any code using KafkaJournalWriter

Looks like the only user of the KafkaJournalWriter class outside the journal is swh.objstorage.replayer, which would be impacted by this diff (unless I'm mistaken) but this is indeed a minor issue.

Looks like the only user of the KafkaJournalWriter class outside the journal is swh.objstorage.replayer, which would be impacted by this diff (unless I'm mistaken) but this is indeed a minor issue.

There is no impact. This diff only extends how KafkaJournalWriter can be used, by accepting more types. All existing uses remain valid.

swh-objstorage-replayer only uses it in a test to do that:

writer = KafkaJournalWriter(
    brokers=[kafka_server],
    client_id="kafka_writer",
    prefix=kafka_prefix,
    anonymize=False,
)

for content in CONTENTS:
    objstorage1.add(content.data)
    writer.write_addition("content", content)

Oh, I guess you meant replacing it to use writer = KafkaJournalWriter[BaseModel](, right?

That's optional, to give an extra type hint to mypy. Without [BaseModel], it just means mypy has to guess it. (It's like using List instead of List[foo])

update docstring of KafkaJournalWriter.

Build is green

Patch application report for D4082 (id=15547)

Could not rebase; Attempt merge onto e998407120...

Auto-merging swh/journal/tests/journal_data.py
Merge made by the 'recursive' strategy.
 swh/journal/serializers.py             | 31 +---------------
 swh/journal/tests/journal_data.py      |  4 +--
 swh/journal/tests/test_kafka_writer.py | 17 ++++++---
 swh/journal/writer/__init__.py         | 29 +++++++++++++++
 swh/journal/writer/inmemory.py         | 20 ++++++-----
 swh/journal/writer/kafka.py            | 65 ++++++++++++++++++++--------------
 6 files changed, 94 insertions(+), 72 deletions(-)
Changes applied before test
commit 380e19c21d574c5d2d95fe159c2e15769b688b6f
Merge: e998407 f3435ab
Author: Jenkins user <jenkins@localhost>
Date:   Mon Nov 2 08:29:33 2020 +0000

    Merge branch 'diff-target' into HEAD

commit f3435ab3be9f64b80848503ae4342e84b94d5a6f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 29 15:58:07 2020 +0200

    Make the type of values of JournalWriter generic, so it works with types not from swh-model.
    
    The indexer-storage will use this.

commit efd94b05bebbfc082227719ac94b2223723f95ed
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 29 15:06:42 2020 +0200

    Make value_sanitizer an argument of JournalWriter.
    
    Instead of being a method of KafkaJournalWriter.
    
    This will allow using JournalWriters for values that aren't model objects.

See https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/124/ for more details.

Build is green

Patch application report for D4082 (id=15550)

Could not rebase; Attempt merge onto e998407120...

Updating e998407..1a6c825
Fast-forward
 swh/journal/serializers.py             | 31 +---------------
 swh/journal/tests/journal_data.py      |  4 +--
 swh/journal/tests/test_kafka_writer.py | 17 ++++++---
 swh/journal/writer/__init__.py         | 29 +++++++++++++++
 swh/journal/writer/inmemory.py         | 20 ++++++-----
 swh/journal/writer/kafka.py            | 65 ++++++++++++++++++++--------------
 6 files changed, 94 insertions(+), 72 deletions(-)
Changes applied before test
commit 1a6c825884f383967811c613fc514507050da342
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 29 15:58:07 2020 +0200

    Make the type of values of JournalWriter generic, so it works with types not from swh-model.
    
    The indexer-storage will use this.

commit 3ae9a5bab45970217d4cd93785c47ffcd936ed57
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 29 15:06:42 2020 +0200

    Make value_sanitizer an argument of JournalWriter.
    
    Instead of being a method of KafkaJournalWriter.
    
    This will allow using JournalWriters for values that aren't model objects.

See https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/126/ for more details.