Page MenuHomeSoftware Heritage

Make value_sanitizer an argument of JournalWriter.
ClosedPublic

Authored by vlorentz on Sep 29 2020, 3:07 PM.

Details

Summary

Instead of being a method of KafkaJournalWriter.

This will allow using JournalWriters for values that aren't model objects.

Depends on D4079

Event Timeline

vlorentz updated this revision to Diff 14399.

fix commit message

Build has FAILED

Patch application report for D4080 (id=14398)

Could not rebase; Attempt merge onto 7815cad28a...

Updating 7815cad..dd04112
Fast-forward
 swh/journal/pytest_plugin.py           |  4 +-
 swh/journal/serializers.py             | 80 +---------------------------------
 swh/journal/tests/test_kafka_writer.py |  9 +++-
 swh/journal/tests/test_serializers.py  | 11 +----
 swh/journal/writer/__init__.py         | 12 +++++
 swh/journal/writer/inmemory.py         |  7 ++-
 swh/journal/writer/kafka.py            | 20 +++------
 7 files changed, 38 insertions(+), 105 deletions(-)
Changes applied before test
commit dd04112fe6c6149d1e3e0a4ff1764c0ae7c3902f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 29 15:06:42 2020 +0200

    Make object_sanitizer an argument of JournalWriter.
    
    Instead of being a method of KafkaJournalWriter.
    
    This will allow using JournalWriters for values that aren't model objects.

commit 1a64c96f97d8b6a56a873a269af88859083359ea
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 29 14:15:35 2020 +0200

    Remove swh.journal.serializers.object_key, use BaseModel.unique_key instead.

Link to build: https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/112/
See console output for more information: https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/112/console

vlorentz retitled this revision from [WIP] Make object_sanitizer an argument of JournalWriter. to [WIP] Make value_sanitizer an argument of JournalWriter..Sep 29 2020, 3:08 PM

Build has FAILED

Patch application report for D4080 (id=14399)

Could not rebase; Attempt merge onto 7815cad28a...

Updating 7815cad..549624e
Fast-forward
 swh/journal/pytest_plugin.py           |  4 +-
 swh/journal/serializers.py             | 80 +---------------------------------
 swh/journal/tests/test_kafka_writer.py |  9 +++-
 swh/journal/tests/test_serializers.py  | 11 +----
 swh/journal/writer/__init__.py         | 12 +++++
 swh/journal/writer/inmemory.py         |  7 ++-
 swh/journal/writer/kafka.py            | 20 +++------
 7 files changed, 38 insertions(+), 105 deletions(-)
Changes applied before test
commit 549624eff1ee246f0f8552491f730123520376e8
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 29 15:06:42 2020 +0200

    Make value_sanitizer an argument of JournalWriter.
    
    Instead of being a method of KafkaJournalWriter.
    
    This will allow using JournalWriters for values that aren't model objects.

commit 1a64c96f97d8b6a56a873a269af88859083359ea
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 29 14:15:35 2020 +0200

    Remove swh.journal.serializers.object_key, use BaseModel.unique_key instead.

Link to build: https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/113/
See console output for more information: https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/113/console

Build is green

Patch application report for D4080 (id=14927)

Could not rebase; Attempt merge onto d3c09acb40...

Merge made by the 'recursive' strategy.
 requirements-swh.txt                   |  2 +-
 swh/journal/pytest_plugin.py           |  4 +-
 swh/journal/serializers.py             | 80 +---------------------------------
 swh/journal/tests/test_kafka_writer.py |  9 +++-
 swh/journal/tests/test_serializers.py  | 11 +----
 swh/journal/writer/__init__.py         | 12 +++++
 swh/journal/writer/inmemory.py         |  7 ++-
 swh/journal/writer/kafka.py            | 20 +++------
 8 files changed, 39 insertions(+), 106 deletions(-)
Changes applied before test
commit cfbd0d8c7aab38a29bc5d74c315da451b56291b2
Merge: d3c09ac 0807bda
Author: Jenkins user <jenkins@localhost>
Date:   Mon Oct 12 10:21:04 2020 +0000

    Merge branch 'diff-target' into HEAD

commit 0807bda42b4adc4c4f9ba33a417da034bb6142e2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 29 15:06:42 2020 +0200

    Make value_sanitizer an argument of JournalWriter.
    
    Instead of being a method of KafkaJournalWriter.
    
    This will allow using JournalWriters for values that aren't model objects.

commit 1f6621ea669f088a1ed0bb5400158038b86d50c5
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 29 14:15:35 2020 +0200

    Remove swh.journal.serializers.object_key, use BaseModel.unique_key instead.

See https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/116/ for more details.

vlorentz retitled this revision from [WIP] Make value_sanitizer an argument of JournalWriter. to Make value_sanitizer an argument of JournalWriter..Oct 19 2020, 2:02 PM

Build is green

Patch application report for D4080 (id=15205)

Rebasing onto abc42e2bf7...

Current branch diff-target is up to date.
Changes applied before test
commit efd94b05bebbfc082227719ac94b2223723f95ed
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 29 15:06:42 2020 +0200

    Make value_sanitizer an argument of JournalWriter.
    
    Instead of being a method of KafkaJournalWriter.
    
    This will allow using JournalWriters for values that aren't model objects.

See https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/119/ for more details.

I'm not a fan of the sanitizer name (I know I probably introduced it :P). Something along the lines of object_filter may be more appropriate?

If you want to make this completely generic, the calls to to_dict should probably not be inlined to the _write_addition function.

This also feels like, if this becomes configurable, the default value should probably be an identity function rather than a function specific to swh.model.model objects. Of course the default should change once swh.storage explicitly calls the journal_writer initialization pointing to the new function.

In D4080#107053, @olasd wrote:

I'm not a fan of the sanitizer name (I know I probably introduced it :P). Something along the lines of object_filter may be more appropriate?

object_filter sounds like it would remove some objects (like Python's filter() function)

If you want to make this completely generic, the calls to to_dict should probably not be inlined to the _write_addition function.

See D4082, which introduces a ValueProtocol that includes to_dict and others.

This also feels like, if this becomes configurable, the default value should probably be an identity function rather than a function specific to swh.model.model objects.
Of course the default should change once swh.storage explicitly calls the journal_writer initialization pointing to the new function.

So after this diff is landed and released, yes.

This revision is now accepted and ready to land.Oct 26 2020, 11:06 AM

Build is green

Patch application report for D4080 (id=15549)

Rebasing onto e998407120...

Current branch diff-target is up to date.
Changes applied before test
commit 3ae9a5bab45970217d4cd93785c47ffcd936ed57
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 29 15:06:42 2020 +0200

    Make value_sanitizer an argument of JournalWriter.
    
    Instead of being a method of KafkaJournalWriter.
    
    This will allow using JournalWriters for values that aren't model objects.

See https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/125/ for more details.