Page MenuHomeSoftware Heritage

Pass the object_type to JournalClient.value_serializer()
ClosedPublic

Authored by douardda on Oct 27 2021, 4:19 PM.

Details

Summary

and make this function an (optional) constructor argument.
If not given, stick to kafka_to_value.

This is needed in order to make it possible for the JournalClient to use
a special value_deserializer implementation that needs the object_type,
for example to make the value_deserializer directly instanciate
BaseModel object.

This will be used by an upcoming refactoring of the storage replayer
that will make sure any BaseModel object coming from the journal is valid,
and log invalid kafka objects in case it's not.

Related to T3693.

Depends on D6564.

Event Timeline

Build is green

Patch application report for D6565 (id=23858)

Could not rebase; Attempt merge onto 95d945e46e...

Updating 95d945e..de483d7
Fast-forward
 docs/journal-clients.rst | 10 +++++++---
 swh/journal/client.py    | 24 +++++++++++++++++-------
 2 files changed, 24 insertions(+), 10 deletions(-)
Changes applied before test
commit de483d7c907e8871ed45859cf09f7ff845d9f247
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Oct 27 14:14:23 2021 +0200

    Pass the object_type to JournalClient.value_serializer()
    
    and make this function an (optional) constructor argument.
    If not given, stick to `kafka_to_value`.
    
    This is needed in order to make it possible for the JournalClient to use
    a special value_deserializer implementation that needs the object_type,
    for example to make the value_deserializer directly instanciate
    BaseModel object.
    
    This will be used by an upcoming refactoring of the storage replayer
    that will make sure any BaseModel object coming from the journal is valid,
    and log invalid kafka objects in case it's not.
    
    Related to T3693.

commit 88054da89653d0e28e6600842eb9287db25b5244
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Oct 27 16:14:17 2021 +0200

    Do call consumer.commit() even if not objects have been received

See https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/174/ for more details.

This revision is now accepted and ready to land.Oct 27 2021, 5:20 PM
vlorentz added a subscriber: vlorentz.
vlorentz added inline comments.
swh/journal/client.py
76–79

You should document the arguments of the function (since its signature is not the same as kafka_to_value)

This revision now requires changes to proceed.Oct 28 2021, 11:38 AM
swh/journal/client.py
76–79

Ah yes, I had this in mind then forgot. thx

Document a bit more the value_deserializer and add a test for it

Build has FAILED

Patch application report for D6565 (id=23888)

Could not rebase; Attempt merge onto 95d945e46e...

Updating 95d945e..a0c9174
Fast-forward
 docs/journal-clients.rst         | 10 +++++---
 swh/journal/client.py            | 30 ++++++++++++++++++------
 swh/journal/tests/test_client.py | 49 +++++++++++++++++++++++++++++++++++++---
 3 files changed, 76 insertions(+), 13 deletions(-)
Changes applied before test
commit a0c9174d81e3aa3b5c2c1e7446e50446c673e527
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Oct 27 14:14:23 2021 +0200

    Pass the object_type to JournalClient.value_serializer()
    
    and make this function an (optional) constructor argument.
    If not given, stick to `kafka_to_value`.
    
    If the returned value is None, it is ignored (not passed to the
    `worker_fn` function).
    
    This is needed in order to make it possible for the JournalClient to use
    a special value_deserializer implementation that needs the object_type,
    for example to make the value_deserializer directly instanciate
    BaseModel object.
    
    This will be used by an upcoming refactoring of the storage replayer
    that will make sure any BaseModel object coming from the journal is valid,
    and log invalid kafka objects in case it's not.
    
    Related to T3693.

commit 88054da89653d0e28e6600842eb9287db25b5244
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Oct 27 16:14:17 2021 +0200

    Do call consumer.commit() even if not objects have been received

Link to build: https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/175/
See console output for more information: https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/175/console

Build is green

Patch application report for D6565 (id=23888)

Could not rebase; Attempt merge onto 95d945e46e...

Updating 95d945e..a0c9174
Fast-forward
 docs/journal-clients.rst         | 10 +++++---
 swh/journal/client.py            | 30 ++++++++++++++++++------
 swh/journal/tests/test_client.py | 49 +++++++++++++++++++++++++++++++++++++---
 3 files changed, 76 insertions(+), 13 deletions(-)
Changes applied before test
commit a0c9174d81e3aa3b5c2c1e7446e50446c673e527
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Oct 27 14:14:23 2021 +0200

    Pass the object_type to JournalClient.value_serializer()
    
    and make this function an (optional) constructor argument.
    If not given, stick to `kafka_to_value`.
    
    If the returned value is None, it is ignored (not passed to the
    `worker_fn` function).
    
    This is needed in order to make it possible for the JournalClient to use
    a special value_deserializer implementation that needs the object_type,
    for example to make the value_deserializer directly instanciate
    BaseModel object.
    
    This will be used by an upcoming refactoring of the storage replayer
    that will make sure any BaseModel object coming from the journal is valid,
    and log invalid kafka objects in case it's not.
    
    Related to T3693.

commit 88054da89653d0e28e6600842eb9287db25b5244
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Oct 27 16:14:17 2021 +0200

    Do call consumer.commit() even if not objects have been received

See https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/176/ for more details.

Build is green

Patch application report for D6565 (id=23899)

Could not rebase; Attempt merge onto 95d945e46e...

Updating 95d945e..e267ee1
Fast-forward
 docs/journal-clients.rst         | 10 +++++---
 swh/journal/client.py            | 30 ++++++++++++++++++------
 swh/journal/tests/test_client.py | 49 +++++++++++++++++++++++++++++++++++++---
 3 files changed, 76 insertions(+), 13 deletions(-)
Changes applied before test
commit e267ee178ccc51a180d75eba2e319d23d1a9d6bb
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Oct 27 14:14:23 2021 +0200

    Pass the object_type to JournalClient.value_serializer()
    
    and make this function an (optional) constructor argument.
    If not given, stick to `kafka_to_value`.
    
    If the returned value is None, it is ignored (not passed to the
    `worker_fn` function).
    
    This is needed in order to make it possible for the JournalClient to use
    a special value_deserializer implementation that needs the object_type,
    for example to make the value_deserializer directly instanciate
    BaseModel object.
    
    This will be used by an upcoming refactoring of the storage replayer
    that will make sure any BaseModel object coming from the journal is valid,
    and log invalid kafka objects in case it's not.
    
    Related to T3693.

commit 88054da89653d0e28e6600842eb9287db25b5244
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Oct 27 16:14:17 2021 +0200

    Do call consumer.commit() even if not objects have been received

See https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/177/ for more details.

Build is green

Patch application report for D6565 (id=23900)

Could not rebase; Attempt merge onto 95d945e46e...

Updating 95d945e..f92d4ac
Fast-forward
 docs/journal-clients.rst         | 10 +++++---
 swh/journal/client.py            | 31 +++++++++++++++++++------
 swh/journal/tests/test_client.py | 49 +++++++++++++++++++++++++++++++++++++---
 3 files changed, 77 insertions(+), 13 deletions(-)
Changes applied before test
commit f92d4acf30e64bd96e4fd536605ddb943bdbd369
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Oct 27 14:14:23 2021 +0200

    Pass the object_type to JournalClient.value_serializer()
    
    and make this function an (optional) constructor argument.
    If not given, stick to `kafka_to_value`.
    
    If the returned value is None, it is ignored (not passed to the
    `worker_fn` function).
    
    This is needed in order to make it possible for the JournalClient to use
    a special value_deserializer implementation that needs the object_type,
    for example to make the value_deserializer directly instanciate
    BaseModel object.
    
    This will be used by an upcoming refactoring of the storage replayer
    that will make sure any BaseModel object coming from the journal is valid,
    and log invalid kafka objects in case it's not.
    
    Related to T3693.

commit 88054da89653d0e28e6600842eb9287db25b5244
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Oct 27 16:14:17 2021 +0200

    Do call consumer.commit() even if not objects have been received

See https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/178/ for more details.

This revision is now accepted and ready to land.Oct 28 2021, 2:12 PM