Page MenuHomeSoftware Heritage

Make value_sanitizer an argument of JournalWriter.
ClosedPublic

Authored by vlorentz on Sep 29 2020, 3:07 PM.

Details

Summary

Instead of being a method of KafkaJournalWriter.

This will allow using JournalWriters for values that aren't model objects.

Depends on D4079

Diff Detail

Repository
rDJNL Journal infrastructure
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

vlorentz updated this revision to Diff 14399.

fix commit message

Build has FAILED

Patch application report for D4080 (id=14398)

Could not rebase; Attempt merge onto 7815cad28a...

Updating 7815cad..dd04112
Fast-forward
 swh/journal/pytest_plugin.py           |  4 +-
 swh/journal/serializers.py             | 80 +---------------------------------
 swh/journal/tests/test_kafka_writer.py |  9 +++-
 swh/journal/tests/test_serializers.py  | 11 +----
 swh/journal/writer/__init__.py         | 12 +++++
 swh/journal/writer/inmemory.py         |  7 ++-
 swh/journal/writer/kafka.py            | 20 +++------
 7 files changed, 38 insertions(+), 105 deletions(-)
Changes applied before test
commit dd04112fe6c6149d1e3e0a4ff1764c0ae7c3902f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 29 15:06:42 2020 +0200

    Make object_sanitizer an argument of JournalWriter.
    
    Instead of being a method of KafkaJournalWriter.
    
    This will allow using JournalWriters for values that aren't model objects.

commit 1a64c96f97d8b6a56a873a269af88859083359ea
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 29 14:15:35 2020 +0200

    Remove swh.journal.serializers.object_key, use BaseModel.unique_key instead.

Link to build: https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/112/
See console output for more information: https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/112/console

vlorentz retitled this revision from [WIP] Make object_sanitizer an argument of JournalWriter. to [WIP] Make value_sanitizer an argument of JournalWriter..Sep 29 2020, 3:08 PM

Build has FAILED

Patch application report for D4080 (id=14399)

Could not rebase; Attempt merge onto 7815cad28a...

Updating 7815cad..549624e
Fast-forward
 swh/journal/pytest_plugin.py           |  4 +-
 swh/journal/serializers.py             | 80 +---------------------------------
 swh/journal/tests/test_kafka_writer.py |  9 +++-
 swh/journal/tests/test_serializers.py  | 11 +----
 swh/journal/writer/__init__.py         | 12 +++++
 swh/journal/writer/inmemory.py         |  7 ++-
 swh/journal/writer/kafka.py            | 20 +++------
 7 files changed, 38 insertions(+), 105 deletions(-)
Changes applied before test
commit 549624eff1ee246f0f8552491f730123520376e8
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 29 15:06:42 2020 +0200

    Make value_sanitizer an argument of JournalWriter.
    
    Instead of being a method of KafkaJournalWriter.
    
    This will allow using JournalWriters for values that aren't model objects.

commit 1a64c96f97d8b6a56a873a269af88859083359ea
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 29 14:15:35 2020 +0200

    Remove swh.journal.serializers.object_key, use BaseModel.unique_key instead.

Link to build: https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/113/
See console output for more information: https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/113/console

Build is green

Patch application report for D4080 (id=14927)

Could not rebase; Attempt merge onto d3c09acb40...

Merge made by the 'recursive' strategy.
 requirements-swh.txt                   |  2 +-
 swh/journal/pytest_plugin.py           |  4 +-
 swh/journal/serializers.py             | 80 +---------------------------------
 swh/journal/tests/test_kafka_writer.py |  9 +++-
 swh/journal/tests/test_serializers.py  | 11 +----
 swh/journal/writer/__init__.py         | 12 +++++
 swh/journal/writer/inmemory.py         |  7 ++-
 swh/journal/writer/kafka.py            | 20 +++------
 8 files changed, 39 insertions(+), 106 deletions(-)
Changes applied before test
commit cfbd0d8c7aab38a29bc5d74c315da451b56291b2
Merge: d3c09ac 0807bda
Author: Jenkins user <jenkins@localhost>
Date:   Mon Oct 12 10:21:04 2020 +0000

    Merge branch 'diff-target' into HEAD

commit 0807bda42b4adc4c4f9ba33a417da034bb6142e2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 29 15:06:42 2020 +0200

    Make value_sanitizer an argument of JournalWriter.
    
    Instead of being a method of KafkaJournalWriter.
    
    This will allow using JournalWriters for values that aren't model objects.

commit 1f6621ea669f088a1ed0bb5400158038b86d50c5
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 29 14:15:35 2020 +0200

    Remove swh.journal.serializers.object_key, use BaseModel.unique_key instead.

See https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/116/ for more details.

vlorentz retitled this revision from [WIP] Make value_sanitizer an argument of JournalWriter. to Make value_sanitizer an argument of JournalWriter..Oct 19 2020, 2:02 PM

Build is green

Patch application report for D4080 (id=15205)

Rebasing onto abc42e2bf7...

Current branch diff-target is up to date.
Changes applied before test
commit efd94b05bebbfc082227719ac94b2223723f95ed
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 29 15:06:42 2020 +0200

    Make value_sanitizer an argument of JournalWriter.
    
    Instead of being a method of KafkaJournalWriter.
    
    This will allow using JournalWriters for values that aren't model objects.

See https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/119/ for more details.

I'm not a fan of the sanitizer name (I know I probably introduced it :P). Something along the lines of object_filter may be more appropriate?

If you want to make this completely generic, the calls to to_dict should probably not be inlined to the _write_addition function.

This also feels like, if this becomes configurable, the default value should probably be an identity function rather than a function specific to swh.model.model objects. Of course the default should change once swh.storage explicitly calls the journal_writer initialization pointing to the new function.

In D4080#107053, @olasd wrote:

I'm not a fan of the sanitizer name (I know I probably introduced it :P). Something along the lines of object_filter may be more appropriate?

object_filter sounds like it would remove some objects (like Python's filter() function)

If you want to make this completely generic, the calls to to_dict should probably not be inlined to the _write_addition function.

See D4082, which introduces a ValueProtocol that includes to_dict and others.

This also feels like, if this becomes configurable, the default value should probably be an identity function rather than a function specific to swh.model.model objects.
Of course the default should change once swh.storage explicitly calls the journal_writer initialization pointing to the new function.

So after this diff is landed and released, yes.

This revision is now accepted and ready to land.Oct 26 2020, 11:06 AM

Build is green

Patch application report for D4080 (id=15549)

Rebasing onto e998407120...

Current branch diff-target is up to date.
Changes applied before test
commit 3ae9a5bab45970217d4cd93785c47ffcd936ed57
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 29 15:06:42 2020 +0200

    Make value_sanitizer an argument of JournalWriter.
    
    Instead of being a method of KafkaJournalWriter.
    
    This will allow using JournalWriters for values that aren't model objects.

See https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/125/ for more details.