Page MenuHomeSoftware Heritage

serializers: Deserialize as lists instead of tuples
AbandonedPublic

Authored by vlorentz on Dec 17 2020, 1:03 PM.

Details

Summary
  1. for consistency with the serializers in swh.core
  2. it's not clear to me that we ever needed that (it was added in rDJNL64cd01fc7c049123e16341fa93239013f152bacc with no rational)
  3. it messes with swh-search reading from the journal and expecting JSON-like data. (T2876#54729)
Test Plan

Diff Detail

Event Timeline

Build has FAILED

Patch application report for D4758 (id=16847)

Rebasing onto d8cd3f9bbe...

Current branch diff-target is up to date.
Changes applied before test
commit 34bcfa4329983bb99e40ceb4b8b41a147c47baf0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Dec 17 13:03:18 2020 +0100

    serializers: Deserialize as lists instead of tuples
    
    1. for consistency with the serializers in swh.core
    2. it's not clear to me that we ever needed that (it was added in
       64cd01fc7c049123e16341fa93239013f152bacc with no rational)
    3. it messes with swh-search reading from the journal and expecting
       JSON-like data.

Link to build: https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/134/
See console output for more information: https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/134/console

vlorentz edited the summary of this revision. (Show Details)

Build has FAILED

Patch application report for D4758 (id=16848)

Rebasing onto d8cd3f9bbe...

Current branch diff-target is up to date.
Changes applied before test
commit 79c32d7a890894c31eef186adeffc35ebffbbc71
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Dec 17 13:03:18 2020 +0100

    serializers: Deserialize as lists instead of tuples
    
    1. for consistency with the serializers in swh.core
    2. it's not clear to me that we ever needed that (it was added in
       64cd01fc7c049123e16341fa93239013f152bacc with no rational)
    3. it messes with swh-search reading from the journal and expecting
       JSON-like data.

Link to build: https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/135/
See console output for more information: https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/135/console

Build has FAILED

Patch application report for D4758 (id=16849)

Rebasing onto d8cd3f9bbe...

Current branch diff-target is up to date.
Changes applied before test
commit c4bf313c83e9a832cde2dc11f3fce6e351eed43b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Dec 17 13:03:18 2020 +0100

    serializers: Deserialize as lists instead of tuples
    
    1. for consistency with the serializers in swh.core
    2. it's not clear to me that we ever needed that (it was added in
       64cd01fc7c049123e16341fa93239013f152bacc with no rational)
    3. it messes with swh-search reading from the journal and expecting
       JSON-like data.

Link to build: https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/136/
See console output for more information: https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/136/console

ardumont edited the summary of this revision. (Show Details)

it's not clear to me that we ever needed that (it was added ...

Well, I only recall it was around the time we started using more tuple everywhere
because it's immutable and list are not.
Aside that, i don't remember...

So, to summarize here, the rationale here try to keep serliazed input and serialized output consistent.

Well, i think we should be consistent all the time then.

Tuple stays tuple, List stays list... to keep some sanity...

Although, now i'm wondering if there was not a joke about msgpack not dealing
correctly with list at some point...

@douardda Do you recall the rationale of D3207 (for the ensure_tuples function)?
(which is becoming ensure_lists in here)

rDJNL64cd01fc7c049123e16341fa93239013f152bacc

Well, I only recall it was around the time we started using more tuple everywhere
because it's immutable and list are not.
Aside that, i don't remember...

But dicts are mutable. We moved to model objects since then, and .from_dict will do the conversion to tuples when needed

rDJNL64cd01fc7c049123e16341fa93239013f152bacc

What do you mean? If that's the answer to my question to david, that does not help.
You said yourself in the description it was missing the rationale.

I'm talking about the comment inlined below (from that diff ^ description).

It also needs the kafka_to_value serializer function to ensure tuples are
used instead of lists.

The previous diff D3207 description does not explain why it needs to be tuples instead of lists.
I'd like to know the explicit reason now [1]

All the more reasons that now you seem to want to keep list both for input as list and tuple...
That change back and forth in behavior does not bode well...

But dicts are mutable. We moved to model objects since then, and .from_dict will do the conversion to tuples when needed

Sorry, I don't follow...

[1] I forgot to ask the rationale at the time, time to fix it.

rDJNL64cd01fc7c049123e16341fa93239013f152bacc

What do you mean? If that's the answer to my question to david, that does not help.
You said yourself in the description it was missing the rationale.

I'm talking about the comment inlined below (from that diff ^ description).

Sorry, I didn't mean to paste that

But dicts are mutable. We moved to model objects since then, and .from_dict will do the conversion to tuples when needed

Sorry, I don't follow...

You say we use tuples instead of lists because they are immutable. But kafka_to_value returns dicts, so they are mutable values anyway.

Sorry, I didn't mean to paste that

ok

You say we use tuples instead of lists because they are immutable. But kafka_to_value
returns dicts, so they are mutable anyway.

oh yeah, sure.

Still, that would not be the first time that we are inconsistent "temporarily" [1] and
then never get a chance to improve it later on.

[1] I actually don't recall if we have any plan to actually migrate the journal to use
model objects (and stop using dicts as well)...

But anyway, back on track, i'm a bit at a loss right now on this.
I'll ping @douardda ;)

[1] I actually don't recall if we have any plan to actually migrate the journal to use
model objects (and stop using dicts as well)...

We can't migrate the indexed metadata to use model objects.

It looks to me that this really about

  1. it messes with swh-search reading from the journal and expecting JSON-like data. (T2876#54729)

in which case I'm not sure why you do not adapt swh-search accordingly. Why not put and use that ensure_list function in the swh-search?

in which case I'm not sure why you do not adapt swh-search accordingly. Why not put and use that ensure_list function in the swh-search?

There are two options:

  1. if the journal keeps lists as-is, then:
    1. this couple of tests needs ensure_list
    2. swh-model converts to tuples
  2. if the journal converts to tuples with ensure_tuples, then:
    1. swh-search needs ensure_list
    2. swh-model still copies tuples (and needs the code to convert from lists anyway)

So option 2 needs more code in total, runs more code in production (convertion back and forth).

I don't see any reason to prefer option 2, besides running a little less code in tests.

Also, option 1 improves consistency with the other serializers, as we agreed on IRC

This revision was not accepted when it landed; it landed in state Needs Review.Dec 18 2020, 2:18 PM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.

oops, didn't mean to push this

Sorry @douardda, but I'm going to assume you are fine with this diff, because I'd really like this to work before Vincent goes on vacation

Sorry @douardda, but I'm going to assume you are fine with this diff, because I'd really like this to work before Vincent goes on vacation

I was not, but, well, life goes on...