Paths

Table of Contentst

Add support for large negative integers in msgpack encoding
ClosedPublic
Actions

Authored by olasd on Nov 23 2020, 10:46 PM.

Details

Reviewers

ardumont
douardda

Group Reviewers

Reviewers

Commits

rDCOREf269e2d3d0f4: Add support for large negative integers in msgpack encoding

Summary

This is to support some of the metadata imported from npm package.jsons in
revisions.

Test Plan

tox test added

Diff Detail

Repository

rDCORE Foundations and core functionalities

Lint

Automatic diff as part of commit; lint not applicable.

Unit

Automatic diff as part of commit; unit tests not applicable.

Event Timeline

olasd created this revision.Nov 23 2020, 10:46 PM

Herald added a reviewer: Reviewers. · View Herald TranscriptNov 23 2020, 10:46 PM

Build is green

Patch application report for D4568 (id=16224)

Rebasing onto e0e55b71cc...

Current branch diff-target is up to date.

Changes applied before test

commit 38728c614169faa3bbf19045abe74cee0369d853
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Mon Nov 23 22:43:15 2020 +0100

    Add support for large negative integers in msgpack encoding
    
    This is to support some of the metadata imported from npm package.jsons in
    revisions.

See https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/166/ for more details.

Harbormaster completed remote builds in B17423: Diff 16224.Nov 23 2020, 10:47 PM

lgtm

I got questions... for those functions in general.

Any particular reason for having those encoders/decoders hard-coded in the main encoding/decoding functions?
(i know you are here only extending the current behavior to solve the problem at hand but i mean long term)

Or can we envision refactor those within the realm of the extra_encoders/decoders at some point?
And then encode_types becomes the simple for look around calling the encoders.
(ideally something equivalent could happen in the decoding part).

This revision is now accepted and ready to land.Nov 24 2020, 9:49 AM

bit sad indeed, but LGTM

Why don't you pass signed=True to int.to_bytes/int.from_bytes, and add "one eighth of a byte" to the length (ie. replace if rem: length += 1 by length += 1) when encoding?

In D4568#114274, @vlorentz wrote:

Why don't you pass signed=True to int.to_bytes/int.from_bytes, and add "one eighth of a byte" to the length (ie. replace if rem: length += 1 by length += 1) when encoding?

Because that's not backwards-compatible with existing serialized data.

Move comment back to the right place

Build is green

Patch application report for D4568 (id=16237)

Rebasing onto e0e55b71cc...

Current branch diff-target is up to date.

Changes applied before test

commit f269e2d3d0f4853a8a94f8eaca7d3373b1ebf092
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Mon Nov 23 22:43:15 2020 +0100

    Add support for large negative integers in msgpack encoding
    
    This is to support some of the metadata imported from npm package.jsons in
    revisions.

See https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/167/ for more details.

Harbormaster completed remote builds in B17434: Diff 16237.Nov 24 2020, 12:02 PM

In D4568#114255, @ardumont wrote:

I got questions... for those functions in general.

Any particular reason for having those encoders/decoders hard-coded in the main encoding/decoding functions?
(i know you are here only extending the current behavior to solve the problem at hand but i mean long term)

Or can we envision refactor those within the realm of the extra_encoders/decoders at some point?
And then encode_types becomes the simple for look around calling the encoders.
(ideally something equivalent could happen in the decoding part).

Yeah, here's my reasoning so far:

extra_encoders/decoders are (originally) for stuff that can't be represented in either json or msgpack, and generate compatible representations in both serializations.

The extra bits for msgpack, as well as the bytes serialization for json, are for base types that are supported in one of either but not the other one; The idea was to have a baseline compact representation of these types, while others would be handled through the more verbose, but pluggable, mechanism.

The fact that the bytes serialization was put in the main ENCODERS/DECODERS is, I believe, a mistake: it'll never be used for msgpack (it should probably just be enabled in the json encoder/decoder stack).

Honestly I'm not sure this reasoning holds much water, but that's what we had currently.

Of course, most of this could also be alleviated by making the revision metadata field bytes instead of arbitrary json, or finally dropping it! but we're not there yet :-)

Closed by commit rDCOREf269e2d3d0f4: Add support for large negative integers in msgpack encoding (authored by olasd). · Explain WhyNov 24 2020, 12:47 PM

This revision was automatically updated to reflect the committed changes.

olasd added a commit: rDCOREf269e2d3d0f4: Add support for large negative integers in msgpack encoding.

In D4568#114334, @olasd wrote:

In D4568#114274, @vlorentz wrote:

Why don't you pass signed=True to int.to_bytes/int.from_bytes, and add "one eighth of a byte" to the length (ie. replace if rem: length += 1 by length += 1) when encoding?

Because that's not backwards-compatible with existing serialized data.

ah yes, I forgot about that

Revision Contents
Changeset List

Path

Size

swh/

core/

api/

serializers.py

12 lines

tests/

test_serializers.py

1 line

Diff 16249

View Options

swh/core/api/serializers.py

View Options

swh/core/api/tests/test_serializers.py

Add support for large negative integers in msgpack encodingClosedPublicActions

Details

Diff Detail

Event Timeline

Patch application report for D4568 (id=16224)

Changes applied before test

Patch application report for D4568 (id=16237)

Changes applied before test

Revision ContentsChangeset List

Diff 16249

swh/core/api/serializers.py

swh/core/api/tests/test_serializers.py

Add support for large negative integers in msgpack encoding
ClosedPublic
Actions

Revision Contents
Changeset List