Page MenuHomeSoftware Heritage

deposit.loader: Fix revision metadata redundancy in deposit metadata
ClosedPublic

Authored by ardumont on Apr 22 2020, 5:24 PM.

Details

Summary

As the title says, this removes the redundancy from the revision's metadata
fields.

Related to T2374

Test Plan

tox

Diff Detail

Repository
rDLDBASE Generic VCS/Package Loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

ardumont created this revision.Apr 22 2020, 5:24 PM

Build is green

Patch application report for D3045 (id=10827)

Rebasing onto 042adcb6e2...

Current branch diff-target is up to date.
Changes applied before test
commit b766c456c196746f3bb2c822723604dcd4e6317d
Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
Date:   Wed Apr 22 17:16:03 2020 +0200

    deposit.loader: Fix revision metadata redundancy in deposit metadata
    
    Related to T2374

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/26/ for more details.

Can you show me the result of an end to end test locally with a real XML to see the result in the revision metadata and in the origin_metadata ?

Can you show me the result of an end to end test locally with a real XML to
see the result in the revision metadata and in the origin_metadata ?

Those are end to end test already.

I will update one to display the following metadata field of the revision:

{'extrinsic': {'provider': 'https://deposit.softwareheritage.org/1/private/777/meta/',
               'raw': {'origin': {'type': 'deposit',
                                  'url': 'https://hal-test.archives-ouvertes.fr/some-external-id'},
                       'origin_metadata': {'metadata': {'@xmlns': ['http://www.w3.org/2005/Atom'],
                                                        'author': ['some '
                                                                   'awesome '
                                                                   'author',
                                                                   'another '
                                                                   'one',
                                                                   'no one'],
                                                        'codemeta:dateCreated': '2017-10-07T15:17:08Z',
                                                        'codemeta:datePublished': '2017-10-08T15:00:00Z',
                                                        'external_identifier': 'some-external-id',
                                                        'url': 'https://hal-test.archives-ouvertes.fr/some-external-id'},
                                           'provider': {'metadata': None,
                                                        'provider_name': 'hal',
                                                        'provider_type': 'deposit_client',
                                                        'provider_url': 'https://hal-test.archives-ouvertes.fr/'},
                                           'tool': {'configuration': {'sword_version': '2'},
                                                    'name': 'swh-deposit',
                                                    'version': '0.0.1'}}},
               'when': '2020-04-23T07:54:46.222850+00:00'},
 'original_artifact': [{'checksums': {'sha1': 'f8c63d7c890a7453498e6cf9fef215d85ec6801d',
                                      'sha256': '474bf646aeeff6d945eb752b1a9f8a40f3d81a88909ee7bd2d08cc822aa361e6'},
                        'filename': 'archive.zip',
                        'length': 956830}]}

That's the expected result as per our conclusion on the task.

Can you show me the result of an end to end test locally with a real XML to see the result in the revision metadata and in the origin_metadata ?

Those are end to end test already.

For the loader specifically. But, that's enough to ensure the impacted revision
metadata field is in the right state. The loader tests using a real swh
pg backend so when reading data from swh,

Deposit wise, their tests are enough to ensure it's behaving accordingly as
well.

I don't want to spend more time on this. You might want to head over the
icinga-plugin repository [1] to add some more checks. Those are end-to-end from
deposit to loader. But i don't see them checking the result of the deposit
loading, they just ensure the deposit state are consistent.

[1] https://forge.softwareheritage.org/source/swh-icinga-plugins/browse/master/swh/icinga_plugins/tests/test_deposit.py

ardumont updated this revision to Diff 10833.Apr 23 2020, 11:01 AM

Add more checks on metadata and make those fields apparent.
Ultimately a good idea, thanks @moranegg ;)

Build is green

Patch application report for D3045 (id=10833)

Rebasing onto 042adcb6e2...

Current branch diff-target is up to date.
Changes applied before test
commit 96f3e296e70ce0ea2f42d0efb0f793106c813fe1
Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
Date:   Wed Apr 22 17:16:03 2020 +0200

    deposit.loader: Fix revision metadata redundancy in deposit metadata
    
    Related to T2374

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/28/ for more details.

see the result in the revision metadata and in the origin_metadata ?

The last test now demonstrates those.

  • the revision metadata field as decided in the related task
  • and the origin_metadata stored in swh for the loaded deposit.
vlorentz accepted this revision.Apr 23 2020, 11:09 AM
This revision is now accepted and ready to land.Apr 23 2020, 11:09 AM