Page MenuHomeSoftware Heritage

deposit: Transit raw metadata to the loader to unify with metadata update scenario
ClosedPublic

Authored by ardumont on Sep 30 2020, 2:06 PM.

Details

Summary

The new update scenario now stores new metadata update to the metadata storage.
The loader does not, it currently stores the transformed xml (into json dict).
The loader passes by this deposit_read call to actually retrieve the data.

So prior to adapting the loader, the information returned by deposit_read need
to provide the raw metadata as well.

This also:

  • adds type to impacted methods along the way.
  • simplifies a bit the current deposit_read tests

In D4101, I refactor this deposit_read endpoint and add
some docs about it.

In D4105, I adapt the deposit loader to this new format change (so it sends
that raw information in the metadata storage instead of the current transformed
metadata)

Related to T2649

Test Plan

tox

Diff Detail

Repository
rDDEP Push deposit
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 15727
Build 24209: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 24208: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D4100 (id=14449)

Rebasing onto 4d72d1be52...

Current branch diff-target is up to date.
Changes applied before test
commit d7e4ddc8e66744675ec0aaebc6614b5cc1b49e22
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Sep 30 14:06:21 2020 +0200

    deposit_read: Transit raw metadata to the loader
    
    Related to T2649

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/164/ for more details.

ardumont edited the summary of this revision. (Show Details)

Rework commit message to explicit the reasons.

ardumont retitled this revision from deposit_read: Transit raw metadata to the loader to deposit: Transit raw metadata to the loader to unify with metadata update scenario.Sep 30 2020, 2:31 PM
ardumont edited the summary of this revision. (Show Details)

Build is green

Patch application report for D4100 (id=14451)

Rebasing onto 4d72d1be52...

Current branch diff-target is up to date.
Changes applied before test
commit bc012e45ed8b5aa35afd7250c1d3c84204a20033
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Sep 30 14:06:21 2020 +0200

    Transit raw metadata to the loader to unify with metadata update scenario
    
    The new update scenario now stores new metadata update to the metadata storage.
    The loader does not, it currently stores the transformed xml (into json dict).
    The loader passes by this deposit_read call to actually retrieve the data.
    
    So prior to adapting the loader, this needs to happen.
    
    This also:
    - adds type to impacted methods along the way.
    - simplifies a bit the current deposit_read tests
    
    Related to T2649

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/165/ for more details.

anlambert added a subscriber: anlambert.

Looks good to me.

swh/deposit/api/private/__init__.py
44

s/1/one/

This revision is now accepted and ready to land.Sep 30 2020, 2:55 PM

Build is green

Patch application report for D4100 (id=14453)

Rebasing onto 4d72d1be52...

Current branch diff-target is up to date.
Changes applied before test
commit 05df4905ced239a205b5e7c8867354309330b31f
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Sep 30 14:06:21 2020 +0200

    Transit raw metadata to the loader to unify with metadata update scenario
    
    The new update scenario now stores new metadata update to the metadata storage.
    The loader does not, it currently stores the transformed xml (into json dict).
    The loader passes by this deposit_read call to actually retrieve the data.
    
    So prior to adapting the loader, this needs to happen.
    
    This also:
    - adds type to impacted methods along the way.
    - simplifies a bit the current deposit_read tests
    
    Related to T2649

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/167/ for more details.

I need a more clear distinction when data and metadata are used..
data and atom are not compatible things... atom is always metadata.
data should always refer to the software source code content.

swh/deposit/api/private/deposit_read.py
135

modify data to metadata entry.

data is the software content.

147

what's the data?

swh/deposit/api/private/deposit_read.py
147

it's the dict response (transformed into json at some point) this api returns, so it's the response body.

ardumont edited the summary of this revision. (Show Details)
  • Rename raw_metadata into metadata_raw key
  • Add back the metadata (dict) under the key metadata_dict
  • Add some docstring to the deposit_read method (clarifying terms as asked by morane ;)

Build is green

Patch application report for D4100 (id=14471)

Rebasing onto 4d72d1be52...

Current branch diff-target is up to date.
Changes applied before test
commit 2ecb65d4f534b2ee74d3bb55caa15ee111f8487b
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Sep 30 14:06:21 2020 +0200

    Transit raw metadata to the loader to unify with metadata update scenario
    
    The new update scenario now stores new metadata update to the metadata storage.
    The loader does not, it currently stores the transformed xml (into json dict).
    The loader passes by this deposit_read call to actually retrieve the data.
    
    So prior to adapting the loader, the information returned by deposit_read need
    to provide the raw metadata as well.
    
    This also:
    - adds type to impacted methods along the way.
    - simplifies a bit the current deposit_read tests
    
    Related to T2649

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/169/ for more details.