Page MenuHomeSoftware Heritage

backfiller: remove convertion of model objects back to dicts.
ClosedPublic

Authored by vlorentz on Aug 17 2020, 10:46 AM.

Details

Summary

As a temporary workaround to remain compatible with existing backfiller converters,
I made them convert back to dict before they are converted again to model objects.

This commit removes this workaround by making converters return model objects.

Depends on D3763.

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D3800 (id=13385)

Could not rebase; Attempt merge onto 3a713dd74a...

Updating 3a713dd7..bd925475
Fast-forward
 swh/storage/backfill.py              |  70 ++++++++----
 swh/storage/converters.py            | 201 +++++++++++++++++------------------
 swh/storage/storage.py               |  18 ++--
 swh/storage/tests/test_converters.py | 112 ++++++++++---------
 4 files changed, 217 insertions(+), 184 deletions(-)
Changes applied before test
commit bd925475d95c9af3c138efef9914806e5d373ac6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 17 10:45:59 2020 +0200

    backfiller: remove convertion of model objects back to dicts.
    
    As a temporary workaround to remain compatible with existing backfiller converters,
    I made them convert back to dict before they are converted again to model objects.
    
    This commit removes this workaround by making converters return model objects.

commit 038a219f84d6b8a4f02b48f9ad3c5d823d097790
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 10:55:06 2020 +0200

    converters: Work on model objects instead of dicts on the "not-db" side.
    
    This is a change internal to the pg storage, that will be needed to make
    revision_get, revision_log, and release_get return model objects.

commit 89656c9b1e6885ee158956d4138e98eb7c52e896
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Aug 17 09:45:09 2020 +0200

    test_converters: Fix test data to match actual values.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/817/ for more details.

douardda added inline comments.
swh/storage/backfill.py
444

now we build model objects here, it "looks" a bit strange to juggle with both CONVERTERS and objects_converter_fn.

Not sure how to improve this, but it does not "feel right"...

LGTM, but see my comment for a possible improvement.

This revision is now accepted and ready to land.Aug 18 2020, 10:51 AM

Yeah but I didn't find a very satisfying solution either. Well, I guess I could do something like this:

CONVERTERS: Dict[str, Callable[[BaseDb, Dict[str, Any]], BaseModel]] = {
    **{
        type_: lambda db, obj: converter(obj)
        for (type_, converter) in object_converter_fn.items()
    },
    "directory": directory_converter,                          
    "raw_extrinsic_metadata": raw_extrinsic_metadata_converter,
    "revision": revision_converter,                            
    "release": release_converter,                              
    "snapshot": snapshot_converter,                            
}

But it's not much of an improvement

Yeah but I didn't find a very satisfying solution either. Well, I guess I could do something like this:

CONVERTERS: Dict[str, Callable[[BaseDb, Dict[str, Any]], BaseModel]] = {
    **{
        type_: lambda db, obj: converter(obj)
        for (type_, converter) in object_converter_fn.items()
    },
    "directory": directory_converter,                          
    "raw_extrinsic_metadata": raw_extrinsic_metadata_converter,
    "revision": revision_converter,                            
    "release": release_converter,                              
    "snapshot": snapshot_converter,                            
}

But it's not much of an improvement

yeah this "extra db argument" makes it uneasy

IMO we should get rid of most of the code in backfiller.py and add paginated endpoints to backends instead. But that's another discussion