Page MenuHomeSoftware Heritage

deposit.loader: Fix revision metadata redundancy in deposit metadata
ClosedPublic

Authored by ardumont on Apr 22 2020, 5:24 PM.

Details

Summary

As the title says, this removes the redundancy from the revision's metadata
fields.

Related to T2374

Test Plan

tox

Diff Detail

Event Timeline

ardumont created this revision.Apr 22 2020, 5:24 PM

Build is green

Patch application report for D3045 (id=10827)

Rebasing onto 042adcb6e2...

Current branch diff-target is up to date.
Changes applied before test
commit b766c456c196746f3bb2c822723604dcd4e6317d
Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
Date:   Wed Apr 22 17:16:03 2020 +0200

    deposit.loader: Fix revision metadata redundancy in deposit metadata
    
    Related to T2374

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/26/ for more details.

Can you show me the result of an end to end test locally with a real XML to see the result in the revision metadata and in the origin_metadata ?

Can you show me the result of an end to end test locally with a real XML to
see the result in the revision metadata and in the origin_metadata ?

Those are end to end test already.

I will update one to display the following metadata field of the revision:

{'extrinsic': {'provider': 'https://deposit.softwareheritage.org/1/private/777/meta/',
               'raw': {'origin': {'type': 'deposit',
                                  'url': 'https://hal-test.archives-ouvertes.fr/some-external-id'},
                       'origin_metadata': {'metadata': {'@xmlns': ['http://www.w3.org/2005/Atom'],
                                                        'author': ['some '
                                                                   'awesome '
                                                                   'author',
                                                                   'another '
                                                                   'one',
                                                                   'no one'],
                                                        'codemeta:dateCreated': '2017-10-07T15:17:08Z',
                                                        'codemeta:datePublished': '2017-10-08T15:00:00Z',
                                                        'external_identifier': 'some-external-id',
                                                        'url': 'https://hal-test.archives-ouvertes.fr/some-external-id'},
                                           'provider': {'metadata': None,
                                                        'provider_name': 'hal',
                                                        'provider_type': 'deposit_client',
                                                        'provider_url': 'https://hal-test.archives-ouvertes.fr/'},
                                           'tool': {'configuration': {'sword_version': '2'},
                                                    'name': 'swh-deposit',
                                                    'version': '0.0.1'}}},
               'when': '2020-04-23T07:54:46.222850+00:00'},
 'original_artifact': [{'checksums': {'sha1': 'f8c63d7c890a7453498e6cf9fef215d85ec6801d',
                                      'sha256': '474bf646aeeff6d945eb752b1a9f8a40f3d81a88909ee7bd2d08cc822aa361e6'},
                        'filename': 'archive.zip',
                        'length': 956830}]}

That's the expected result as per our conclusion on the task.

Can you show me the result of an end to end test locally with a real XML to see the result in the revision metadata and in the origin_metadata ?

Those are end to end test already.

For the loader specifically. But, that's enough to ensure the impacted revision
metadata field is in the right state. The loader tests using a real swh
pg backend so when reading data from swh,

Deposit wise, their tests are enough to ensure it's behaving accordingly as
well.

I don't want to spend more time on this. You might want to head over the
icinga-plugin repository [1] to add some more checks. Those are end-to-end from
deposit to loader. But i don't see them checking the result of the deposit
loading, they just ensure the deposit state are consistent.

[1] https://forge.softwareheritage.org/source/swh-icinga-plugins/browse/master/swh/icinga_plugins/tests/test_deposit.py

ardumont updated this revision to Diff 10833.Apr 23 2020, 11:01 AM

Add more checks on metadata and make those fields apparent.
Ultimately a good idea, thanks @moranegg ;)

Build is green

Patch application report for D3045 (id=10833)

Rebasing onto 042adcb6e2...

Current branch diff-target is up to date.
Changes applied before test
commit 96f3e296e70ce0ea2f42d0efb0f793106c813fe1
Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
Date:   Wed Apr 22 17:16:03 2020 +0200

    deposit.loader: Fix revision metadata redundancy in deposit metadata
    
    Related to T2374

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/28/ for more details.

see the result in the revision metadata and in the origin_metadata ?

The last test now demonstrates those.

  • the revision metadata field as decided in the related task
  • and the origin_metadata stored in swh for the loaded deposit.
vlorentz accepted this revision.Apr 23 2020, 11:09 AM
This revision is now accepted and ready to land.Apr 23 2020, 11:09 AM