Page MenuHomeSoftware Heritage

converters: Work on model objects instead of dicts on the "not-db" side.
ClosedPublic

Authored by vlorentz on Aug 12 2020, 10:56 AM.

Details

Summary

This is a change internal to the pg storage, that will be needed to make
revision_get, revision_log, and release_get return model objects.

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

vlorentz created this revision.Aug 12 2020, 10:56 AM

Build has FAILED

Patch application report for D3763 (id=13243)

Rebasing onto 6675286c4f...

Current branch diff-target is up to date.
Changes applied before test
commit f073aaf2d31bb12c06e9ebceaf6adc509dbac9ab
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 10:55:06 2020 +0200

    converters: Work on model objects instead of dicts on the "not-db" side.
    
    This is a change internal to the pg storage, that will be needed to make
    revision_get, revision_log, and release_get return model objects.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/734/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/734/console

ardumont added inline comments.
swh/storage/converters.py
35

Make that a Person object.

41

Make that a TimestampWithTimezone...
(and rework the function below to only manipulate our model objects, it's more consistent).

What do you think?

57

Drop the indirection now or make it consistent so that it returns a Person object?

I don't understand. _to_db() functions should return dicts, not model objects, it's the whole point.

And DEFAULT_AUTHOR and DEFAULT_DATE are only used as return values of these functions.

ardumont added inline comments.Aug 12 2020, 12:31 PM
swh/storage/converters.py
176

here you are manipulating both dict and model objects...
That now becomes inconsistent and annoying... (well for me ;)

I don't understand. _to_db() functions should return dicts, not model objects, it's the whole point.

It's not the whole point.

All this stuff was when we had only dicts to manipulate because dicts, at that time, represented our data model...

Now that we are using model objects for this representation (dict no more), my take is to manipulate those...
Our model objects should be used instead.

Following this reasoning, in the long run, i foresee the the converters module(s) should subside...
And the backend should simply know how to manipulate model objets to serialize into the backend... (and read from it).

And DEFAULT_AUTHOR and DEFAULT_DATE are only used as return values of these functions.

yeah, i know, see me previous point...


I'm merely suggesting to go that way as i found it makes more sense.
But that could be done at another time if at all ;)

Cheers,

vlorentz added a comment.EditedAug 12 2020, 3:13 PM

It still doesn't make sense. What you're saying is that db.py should take model objects instead of dictionaries as arguments.

But if we do that, we still need to convert between model objects and postgresql-compatible types somewhere; and you're just moving converters calls from storage.py to db.py, but converters.py still needs to convert between model objects and database types. And database types do not fit in model objects.

ardumont accepted this revision.Aug 13 2020, 12:07 AM

It still doesn't make sense. What you're saying is that db.py should take model objects instead of dictionaries as arguments.

isn't it kinda what's done already with cassandra/cql?

You just added types very/extremely similar to the swh.model but slightly differentl (for some reason which escape me right now) within cql.
Granted for 'read' as far as i remember. nethertheless, it could be consistently done both ways.

...
but meh, like i said "if at all".
...

This revision is now accepted and ready to land.Aug 13 2020, 12:07 AM

It still doesn't make sense. What you're saying is that db.py should take model objects instead of dictionaries as arguments.

isn't it kinda what's done already with cassandra/cql?

You just added types very/extremely similar to the swh.model but slightly differentl (for some reason which escape me right now) within cql.

Yes, that could be done; but not with the model objects themselves. For Cassandra the differences are:

  1. no enum
  2. serialized JSON
  3. directory entries / snapshot branches / revision parents in the main objects (because separate tables)

for postgres, point 1 would apply and possibly point 3. Additionally for postgres:

  1. no TimestampWithTimezone object, we'll have to flatten them
  2. persons will have to be joined too
vlorentz updated this revision to Diff 13383.Aug 17 2020, 9:53 AM

rebase + fix tests

Build has FAILED

Patch application report for D3763 (id=13383)

Could not rebase; Attempt merge onto 3a713dd74a...

Updating 3a713dd7..1b033e4b
Fast-forward
 swh/storage/converters.py            | 201 +++++++++++++++++------------------
 swh/storage/storage.py               |  18 ++--
 swh/storage/tests/test_converters.py | 112 ++++++++++---------
 3 files changed, 169 insertions(+), 162 deletions(-)
Changes applied before test
commit 1b033e4b118df8ae5d92592474ad0e626daab6f1
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 10:55:06 2020 +0200

    converters: Work on model objects instead of dicts on the "not-db" side.
    
    This is a change internal to the pg storage, that will be needed to make
    revision_get, revision_log, and release_get return model objects.

commit 89656c9b1e6885ee158956d4138e98eb7c52e896
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 17 09:45:09 2020 +0200

    test_converters: Fix test data to match actual values.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/815/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/815/console

vlorentz updated this revision to Diff 13384.Aug 17 2020, 10:25 AM

convert back in backfill.py

Build is green

Patch application report for D3763 (id=13384)

Rebasing onto 3a713dd74a...

Current branch diff-target is up to date.
Changes applied before test
commit 038a219f84d6b8a4f02b48f9ad3c5d823d097790
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 10:55:06 2020 +0200

    converters: Work on model objects instead of dicts on the "not-db" side.
    
    This is a change internal to the pg storage, that will be needed to make
    revision_get, revision_log, and release_get return model objects.

commit 89656c9b1e6885ee158956d4138e98eb7c52e896
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 17 09:45:09 2020 +0200

    test_converters: Fix test data to match actual values.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/816/ for more details.