Page MenuHomeSoftware Heritage

swh.deposit.parsers: Ensure SWHIDs with context are strictly valid
ClosedPublic

Authored by ardumont on Nov 17 2020, 11:55 AM.

Details

Summary

If not, those errors are caught early enough so the deposit users are notified
and can fix those wrong inputs.

This further checks the SWHIDs with qualifiers entries. Typically, ensuring the
"anchor" and "visit" SWHIDs if provided are valid according to the documentation
[1]

  • "visit" is a snapshot core swhid (if any)
  • "anchor" is a core swhid of leaf node (either directory, a revision, a release or a snapshot)

I did not push this in swh.model.identifiers.parse_swhid but it should ideally go there. As
people already mentioned it was slow already, i thought of keeping it here first.

[1] https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html#qualifiers

Related to T2537

Test Plan

tox

Diff Detail

Repository
rDDEP Push deposit
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D4490 (id=15928)

Could not rebase; Attempt merge onto a36ea4db88...

Updating a36ea4db..f1dfe96d
Fast-forward
 swh/deposit/api/common.py                          | 169 ++++++++++++-
 swh/deposit/api/deposit_update.py                  |  95 +-------
 swh/deposit/config.py                              |   5 +
 swh/deposit/parsers.py                             |  36 ++-
 swh/deposit/tests/api/test_deposit_metadata.py     | 268 +++++++++++++++++++++
 swh/deposit/tests/api/test_parsers.py              |  40 +--
 swh/deposit/tests/conftest.py                      |   6 +-
 .../tests/data/atom/entry-data-with-origin.xml     |  13 +
 .../tests/data/atom/entry-data-with-swhid.xml      |  13 +
 9 files changed, 532 insertions(+), 113 deletions(-)
 create mode 100644 swh/deposit/tests/api/test_deposit_metadata.py
 create mode 100644 swh/deposit/tests/data/atom/entry-data-with-origin.xml
 create mode 100644 swh/deposit/tests/data/atom/entry-data-with-swhid.xml
Changes applied before test
commit f1dfe96d5fce0368fecdd397e06adc2f2c1d88ed
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Nov 17 11:49:55 2020 +0100

    swh.deposit.parsers: Ensure SWHIDs with context are valid
    
    If not, those errors are caught early enough so the deposit users are notified
    and can fix those.
    
    This further checks the SWHIDs with qualifiers entries. Typically, ensuring the
    "anchor" and "visit" SWHIDs if provided are valid according to the documentation
    [1]
    
    - "visit" is a snapshot core swhid (if any)
    - "anchor" is a core swhid of leaf node (either directory, a revision, a release or a snapshot)
    
    [1] https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html#qualifiers
    
    Related to T2537

commit 32104ea235d5c736d7f26712bdf48429a45c0706
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Nov 17 09:55:10 2020 +0100

    metadata-only: Finalize deposit with the swhid (if any) and dates
    
    Related to T2537

commit c2795d60037e981aee2868851755c019474a992d
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Nov 13 19:15:58 2020 +0100

    metadata-only: Add tests on deposit with swhid and origin reference
    
    Related to T2537

commit 477a0e0edc1ff01a6bdbd72ea27955a6096ff16a
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Nov 13 18:27:23 2020 +0100

    metadata-only: Reuse existing metadata deposit code
    
    Related to T2537

commit 77fca0e1d8b452d22e26771cf243e7fa99bab8f2
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Nov 13 14:32:05 2020 +0100

    metadata-only: Start deposit
    
    It does nothing right now but add the detection of the metadata-only deposit.
    And fails in case of invalid input.
    
    Related to T2537

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/335/ for more details.

ardumont added a project: SWORD deposit.
ardumont retitled this revision from swh.deposit.parsers: Ensure SWHIDs with context are valid to swh.deposit.parsers: Ensure SWHIDs with context are strictly valid.
ardumont edited the summary of this revision. (Show Details)

IMO the SWHID class should store the parsed core SWHIDs for its qualifiers, so they don't need to be parsed twice (+ error handling)

IMO the SWHID class should store the parsed core SWHIDs for its qualifiers, so they don't need to be parsed twice (+ error handling)

shouldn't this be a dedicated task?

This revision is now accepted and ready to land.Nov 17 2020, 12:47 PM

IMO the SWHID class should store the parsed core SWHIDs for its qualifiers, so they don't need to be parsed twice (+ error handling)

shouldn't this be a dedicated task?

true

As a result, I opened T2788. I tried to synthetize the different suggestions
from here and the other deposit diff. Feel free to improve the description ;)

Cheers,

Build is green

Patch application report for D4490 (id=15929)

Rebasing onto a36ea4db88...

Current branch diff-target is up to date.
Changes applied before test
commit c1a45162d3313e2cbb19ddc31977deac029d3411
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Nov 17 11:49:55 2020 +0100

    swh.deposit.parsers: Ensure SWHIDs with context are valid
    
    If not, those errors are caught early enough so the deposit users are notified
    and can fix those.
    
    This further checks the SWHIDs with qualifiers entries. Typically, ensuring the
    "anchor" and "visit" SWHIDs if provided are valid according to the documentation
    [1]
    
    - "visit" is a snapshot core swhid (if any)
    - "anchor" is a core swhid of leaf node (either directory, a revision, a release or a snapshot)
    
    [1] https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html#qualifiers
    
    Related to T2537

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/336/ for more details.