Page MenuHomeSoftware Heritage

Simplify and document actual information returned to the loader
ClosedPublic

Authored by ardumont on Sep 30 2020, 2:32 PM.

Details

Summary

This is a breaking change which needs adaptation in the deposit loader (D4105).

This moved the provider, tool and raw_metadata keys to the top-level.

(origin in the end is still used by the loader when flushing the metadata in the revision...)

The main reason is to cleanup. Took the opportunity since I need a
deployment for D4100 anyways.

Related to T2649

Test Plan

tox

Diff Detail

Repository
rDDEP Push deposit
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 15730
Build 24214: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 24213: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D4101 (id=14452)

Could not rebase; Attempt merge onto 4d72d1be52...

Updating 4d72d1be..137fe219
Fast-forward
 swh/deposit/api/private/__init__.py                |  27 +-
 swh/deposit/api/private/deposit_check.py           |  11 +-
 swh/deposit/api/private/deposit_read.py            |  54 +--
 .../api/test_deposit_private_read_metadata.py      | 393 ++++++---------------
 4 files changed, 164 insertions(+), 321 deletions(-)
Changes applied before test
commit 137fe21904eaae84447979b35ab9b8a97f5b4829
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Sep 30 14:27:24 2020 +0200

    Simplify and document actual information returned to the loader
    
    This is a breaking change which needs adaptation to the loader.
    
    This:
    - drops the no longer used origin keys
    - moved the provider, tool and raw_metadata keys to the top-level
    
    The main reason is to cleanup.
    
    Related to T2649

commit bc012e45ed8b5aa35afd7250c1d3c84204a20033
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Sep 30 14:06:21 2020 +0200

    Transit raw metadata to the loader to unify with metadata update scenario
    
    The new update scenario now stores new metadata update to the metadata storage.
    The loader does not, it currently stores the transformed xml (into json dict).
    The loader passes by this deposit_read call to actually retrieve the data.
    
    So prior to adapting the loader, this needs to happen.
    
    This also:
    - adds type to impacted methods along the way.
    - simplifies a bit the current deposit_read tests
    
    Related to T2649

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/166/ for more details.

anlambert added inline comments.
swh/deposit/api/private/deposit_read.py
146

(as) should be removed I think

swh/deposit/tests/api/test_deposit_private_read_metadata.py
112–119

Is this indentation change intended ?

swh/deposit/api/private/deposit_read.py
146

right ;)

swh/deposit/tests/api/test_deposit_private_read_metadata.py
112–119

ah no...

it makes it do it as many times as there are atom keys, thx for noticing!

Fix indentation (and add back the origin)

Build is green

Patch application report for D4101 (id=14454)

Could not rebase; Attempt merge onto 4d72d1be52...

Updating 4d72d1be..f222108f
Fast-forward
 swh/deposit/api/private/__init__.py                |  27 +-
 swh/deposit/api/private/deposit_check.py           |  11 +-
 swh/deposit/api/private/deposit_read.py            |  57 ++--
 .../api/test_deposit_private_read_metadata.py      | 380 +++++++--------------
 4 files changed, 171 insertions(+), 304 deletions(-)
Changes applied before test
commit f222108f2c3ca51e4b5fd0ce42e2fadba5a399e5
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Sep 30 14:27:24 2020 +0200

    Simplify and document actual information returned to the loader
    
    This is a breaking change which needs adaptation to the loader.
    
    This:
    - drops the no longer used origin keys
    - moved the provider, tool and raw_metadata keys to the top-level
    
    The main reason is to cleanup.
    
    Related to T2649

commit 05df4905ced239a205b5e7c8867354309330b31f
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Sep 30 14:06:21 2020 +0200

    Transit raw metadata to the loader to unify with metadata update scenario
    
    The new update scenario now stores new metadata update to the metadata storage.
    The loader does not, it currently stores the transformed xml (into json dict).
    The loader passes by this deposit_read call to actually retrieve the data.
    
    So prior to adapting the loader, this needs to happen.
    
    This also:
    - adds type to impacted methods along the way.
    - simplifies a bit the current deposit_read tests
    
    Related to T2649

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/168/ for more details.

This revision is now accepted and ready to land.Sep 30 2020, 3:22 PM
vlorentz added a subscriber: vlorentz.

No. We're not adding yet another revision format now.

This revision now requires changes to proceed.Oct 1 2020, 10:51 AM

No. We're not adding yet another revision format now.

?

That's not another revision format.

The loader reads this and writes it in the revision, doesn't it?

The loader reads this and writes it in the revision, doesn't it?

Yes, it reads it but not writes it directly as is.

As D4105 adaptation shows, their tests are still green and iirc, the metadata format is tested there.

ok, I missed that D4105 reformats the metadata.

But this diff isn't simplifying anything, it just unnests dictionaries in what the API returns, and *adds* complexity to the loader that must undo this change.

ok, I missed that D4105 reformats the metadata.

But this diff isn't simplifying anything, it just unnests dictionaries in what the API returns,

kinda granted.
But can you tell me what the "origin_metadata" key we came up stood for initially?
I don't understand it so i prefer it goes away and since i'm reworking it to introduce the type change from json to raw... (plus now we'll have both...)
in my mind, might as well clarify intent here.

and *adds* complexity to the loader that must undo this change.

Don't forget i asked about removing altogether that part from the revision though...
Anyway, that "complexity" is temporary.
At some point, we'll finally be able to stop writing to the revision those metadata.

This revision is now accepted and ready to land.Oct 1 2020, 11:34 AM

Rebase and adapt according to D4100

Build is green

Patch application report for D4101 (id=14475)

Could not rebase; Attempt merge onto 4d72d1be52...

Updating 4d72d1be..e15f23dc
Fast-forward
 swh/deposit/api/private/__init__.py                |  27 +-
 swh/deposit/api/private/deposit_check.py           |  11 +-
 swh/deposit/api/private/deposit_read.py            |  64 ++--
 .../api/test_deposit_private_read_metadata.py      | 391 +++++++--------------
 4 files changed, 184 insertions(+), 309 deletions(-)
Changes applied before test
commit e15f23dc0a5e09507806459f6c2457fccbf33805
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Sep 30 14:27:24 2020 +0200

    Simplify and document actual information returned to the loader
    
    This is a breaking change which needs adaptation to the loader.
    
    This:
    - drops the no longer used origin keys
    - moved the provider, tool and raw_metadata keys to the top-level
    
    The main reason is to cleanup.
    
    Related to T2649

commit 2ecb65d4f534b2ee74d3bb55caa15ee111f8487b
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Sep 30 14:06:21 2020 +0200

    Transit raw metadata to the loader to unify with metadata update scenario
    
    The new update scenario now stores new metadata update to the metadata storage.
    The loader does not, it currently stores the transformed xml (into json dict).
    The loader passes by this deposit_read call to actually retrieve the data.
    
    So prior to adapting the loader, the information returned by deposit_read need
    to provide the raw metadata as well.
    
    This also:
    - adds type to impacted methods along the way.
    - simplifies a bit the current deposit_read tests
    
    Related to T2649

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/170/ for more details.