Page MenuHomeSoftware Heritage

Use msgpack extension type to encode datetime objects in the journal
ClosedPublic

Authored by douardda on Dec 3 2020, 1:13 PM.

Details

Summary

The encoding of datetime objects is done using msgpack extension types
(instead of former b'swhtype" custom types).

Support for decoding these later is however provided for backward
compatbility.

Also make the serialization code of the journal independant from
swh.core.api.serializer.

This later move aims at making the msgpack serialization process of the
journal to be specified (and predictable). The code from swh.core is
dedicated at RPC and can thus do much more "custom types" than what is
needed for the journal. So by not using the serialization code from the
swh.core pakage, we make sure we do not inadvertently encode unspecified
objects in the journal.

Related to T2834

Event Timeline

Build is green

Patch application report for D4655 (id=16509)

Rebasing onto 12b31a2621...

Current branch diff-target is up to date.
Changes applied before test
commit c5122fa000cc5cf164495bb427818548438f7815
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Dec 3 13:05:59 2020 +0100

    Use msgpack extension type to encode datetime objects in the journal
    
    The encoding of datetime objects is done using msgpack extension types
    (instead of former b'swhtype" custom types).
    
    Support for decoding these later is however provided for backward
    compatbility.
    
    Also make the serialization code of the journal independant from
    swh.core.api.serializer.
    
    This later move aims at making the msgpack serialization process of the
    journal to be specified (and predictable). The code from swh.core is
    dedicated at RPC and can thus do much more "custom types" than what is
    needed for the journal. So by not using the serialization code from the
    swh.core pakage, we make sure we do not inadvertently encode unspecified
    objects in the journal.
    
    Related to T2834

See https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/128/ for more details.

Can we use the built-in msgpack timestamp extension type (code = -1) for timestamps instead: https://msgpack-python.readthedocs.io/en/latest/api.html#msgpack.Timestamp

I think we'll need a conversion function on output (kafka_to_value) but that would be more universal than our own iso8601-based thing.

Use (poorly/not documented) timestapmp/datetime feature flags of mashpack to handle datetime objs

Build is green

Patch application report for D4655 (id=16536)

Rebasing onto 12b31a2621...

Current branch diff-target is up to date.
Changes applied before test
commit d3a99afa5271d6c106f32999cfb93ebffd78249e
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Dec 3 13:05:59 2020 +0100

    Use msgpack extension type to encode datetime objects in the journal
    
    The encoding of datetime objects is done using msgpack extension types
    (instead of former b'swhtype" custom types).
    
    Support for decoding these later is however provided for backward
    compatbility.
    
    Also make the serialization code of the journal independant from
    swh.core.api.serializer.
    
    This later move aims at making the msgpack serialization process of the
    journal to be specified (and predictable). The code from swh.core is
    dedicated at RPC and can thus do much more "custom types" than what is
    needed for the journal. So by not using the serialization code from the
    swh.core pakage, we make sure we do not inadvertently encode unspecified
    objects in the journal.
    
    Related to T2834

See https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/129/ for more details.

Simplify a bit the bw compat code (no need for "genericity" there)

Build is green

Patch application report for D4655 (id=16538)

Rebasing onto 12b31a2621...

Current branch diff-target is up to date.
Changes applied before test
commit 51b000bb7fe62820ab8f7c8349a005f05c376bb7
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Dec 3 13:05:59 2020 +0100

    Use msgpack extension type to encode datetime objects in the journal
    
    The encoding of datetime objects is done using msgpack extension types
    (instead of former b'swhtype" custom types).
    
    Support for decoding these later is however provided for backward
    compatbility.
    
    Also make the serialization code of the journal independant from
    swh.core.api.serializer.
    
    This later move aims at making the msgpack serialization process of the
    journal to be specified (and predictable). The code from swh.core is
    dedicated at RPC and can thus do much more "custom types" than what is
    needed for the journal. So by not using the serialization code from the
    swh.core pakage, we make sure we do not inadvertently encode unspecified
    objects in the journal.
    
    Related to T2834

See https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/130/ for more details.

Improve (?) the commit message

Build is green

Patch application report for D4655 (id=16540)

Rebasing onto 12b31a2621...

Current branch diff-target is up to date.
Changes applied before test
commit 593bd088b0d08ac5b1c1ce339a3a151fcec01cb1
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Dec 3 13:05:59 2020 +0100

    Use msgpack Timestamp extension type to encode datetime objects in the journal
    
    The encoding of datetime objects is done using msgpack extension types
    (instead of former b'swhtype" custom types).
    
    Note that this imply that the timezone of the encoded datime is "lost in
    translation": the resulting datetime will be a (tz-aware) UTC datetime
    object.
    
    Support for decoding these later is however provided for backward
    compatbility.
    
    Also make the serialization code of the journal independant from
    swh.core.api.serializer.
    
    This later move aims at making the msgpack serialization process of the
    journal to be specified (and predictable). The code from swh.core is
    dedicated at RPC and can thus do much more "custom types" than what is
    needed for the journal. So by not using the serialization code from the
    swh.core pakage, we make sure we do not inadvertently encode unspecified
    objects in the journal.
    
    Related to T2834

See https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/131/ for more details.

Add a few tests for msgpack codecs (dates and long integers)

Build is green

Patch application report for D4655 (id=16542)

Rebasing onto 12b31a2621...

Current branch diff-target is up to date.
Changes applied before test
commit 8dd97125df2db0eacf73e8b0407837d169bc6b85
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Dec 3 13:05:59 2020 +0100

    Use msgpack Timestamp extension type to encode datetime objects in the journal
    
    The encoding of datetime objects is done using msgpack extension types
    (instead of former b'swhtype" custom types).
    
    Note that this imply that the timezone of the encoded datime is "lost in
    translation": the resulting datetime will be a (tz-aware) UTC datetime
    object.
    
    Support for decoding these later is however provided for backward
    compatbility.
    
    Also make the serialization code of the journal independant from
    swh.core.api.serializer.
    
    This later move aims at making the msgpack serialization process of the
    journal to be specified (and predictable). The code from swh.core is
    dedicated at RPC and can thus do much more "custom types" than what is
    needed for the journal. So by not using the serialization code from the
    swh.core pakage, we make sure we do not inadvertently encode unspecified
    objects in the journal.
    
    Related to T2834

See https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/132/ for more details.

This revision is now accepted and ready to land.Dec 4 2020, 2:43 PM