Page MenuHomeSoftware Heritage

metadata-only: Restrict swhid use of qualifier anchor and visit on snapshot
ClosedPublic

Authored by ardumont on Tue, Nov 17, 3:30 PM.

Details

Summary

I don't know what such SWHID would mean but technically it's a valid one (as
far as i understood the grammar [1])

Nonetheless, we currently have a technical restriction when writing extrinsic
metadata though. We can only have one snapshot reference possible there...

Let's discuss what to do then ;)

In this diff, I currently propose to log and return a bad request for such case.

That's the gist of the discussion, is that reasonable? If not, what to do instead?

[1] https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html#syntax

Related to T2537 D4475

Test Plan

tox

Diff Detail

Repository
rDDEP Push deposit
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

ardumont created this revision.Tue, Nov 17, 3:30 PM

Build has FAILED

Patch application report for D4493 (id=15938)

Could not rebase; Attempt merge onto c1a45162d3...

Updating c1a45162..e943e216
Fast-forward
 swh/deposit/api/common.py                          | 145 +++++++++++-
 swh/deposit/api/deposit_update.py                  |  95 +-------
 swh/deposit/config.py                              |   5 +
 swh/deposit/parsers.py                             |  12 +
 swh/deposit/tests/api/test_deposit_metadata.py     | 256 +++++++++++++++++++++
 swh/deposit/tests/api/test_parsers.py              |  19 +-
 swh/deposit/tests/conftest.py                      |   6 +-
 .../tests/data/atom/entry-data-with-origin.xml     |  13 ++
 .../tests/data/atom/entry-data-with-swhid.xml      |  13 ++
 swh/deposit/tests/test_utils.py                    |  61 ++++-
 swh/deposit/utils.py                               |  40 +++-
 11 files changed, 555 insertions(+), 110 deletions(-)
 create mode 100644 swh/deposit/tests/api/test_deposit_metadata.py
 create mode 100644 swh/deposit/tests/data/atom/entry-data-with-origin.xml
 create mode 100644 swh/deposit/tests/data/atom/entry-data-with-swhid.xml
Changes applied before test
commit e943e2163ee7315a48fe2b251f5b125a6a5dabda
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Nov 17 15:28:27 2020 +0100

    metadata-only: Restrict swhid use of qualifier anchor and visit on snapshot
    
    Related to T2537

commit 85dc1f7f3d910ffff66ef00dd875794997c4b7b3
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Nov 13 18:27:23 2020 +0100

    Adapt existing POST to a collection to allow metadata-only deposit
    
    The following endpoint is adapted to allow it:
    
    POST /1/<collection>
    HEADER: Content-type: application/atom+xml;type=entry
    And the xml provided by the client contains either:
    
    <swh:deposit>
      <swh:reference>
        <swh:object swhid="{swhid}" />
      </swh:reference>
    </swh:deposit>
    or:
    
    <swh:deposit>
      <swh:reference>
        <swh:origin url="{url}" />
      </swh:reference>
    </swh:deposit>
    
    Any invalid swhid raises a 400 bad request. If everything passes, a 201
    response with usual deposit receipt is received by the deposit user. The
    deposit row is updated with status "done", complete_date and reception_date to
    the same and current date, the swhid columns are updated as well if any.
    
    Note: This technically is reusing the existing code from the metadata update
    scenario. The current metadata update still happens the same way as before.
    
    Related to T2537

commit ce7ab89f267d84dabe032d2dcbe26f21b7e165f0
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Nov 13 14:32:05 2020 +0100

    metadata-only: Start deposit
    
    It does nothing right now but add the detection of the metadata-only deposit.
    And fails in case of invalid input.
    
    Related to T2537

Link to build: https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/340/
See console output for more information: https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/340/console

ardumont edited the summary of this revision. (Show Details)Tue, Nov 17, 3:33 PM
ardumont updated this revision to Diff 15939.Tue, Nov 17, 3:35 PM
ardumont edited the summary of this revision. (Show Details)

Fix test

Build is green

Patch application report for D4493 (id=15939)

Could not rebase; Attempt merge onto c1a45162d3...

Updating c1a45162..be2056b9
Fast-forward
 swh/deposit/api/common.py                          | 145 +++++++++++-
 swh/deposit/api/deposit_update.py                  |  95 +-------
 swh/deposit/config.py                              |   5 +
 swh/deposit/parsers.py                             |  22 ++
 swh/deposit/tests/api/test_deposit_metadata.py     | 256 +++++++++++++++++++++
 swh/deposit/tests/api/test_parsers.py              |  19 +-
 swh/deposit/tests/conftest.py                      |   6 +-
 .../tests/data/atom/entry-data-with-origin.xml     |  13 ++
 .../tests/data/atom/entry-data-with-swhid.xml      |  13 ++
 swh/deposit/tests/test_utils.py                    |  61 ++++-
 swh/deposit/utils.py                               |  40 +++-
 11 files changed, 565 insertions(+), 110 deletions(-)
 create mode 100644 swh/deposit/tests/api/test_deposit_metadata.py
 create mode 100644 swh/deposit/tests/data/atom/entry-data-with-origin.xml
 create mode 100644 swh/deposit/tests/data/atom/entry-data-with-swhid.xml
Changes applied before test
commit be2056b943947cdec2395bbb80663a975a4e0793
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Nov 17 15:28:27 2020 +0100

    Refuse SWHID use with qualifier anchor and visit targeting both a snapshot
    
    Related to T2537

commit 85dc1f7f3d910ffff66ef00dd875794997c4b7b3
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Nov 13 18:27:23 2020 +0100

    Adapt existing POST to a collection to allow metadata-only deposit
    
    The following endpoint is adapted to allow it:
    
    POST /1/<collection>
    HEADER: Content-type: application/atom+xml;type=entry
    And the xml provided by the client contains either:
    
    <swh:deposit>
      <swh:reference>
        <swh:object swhid="{swhid}" />
      </swh:reference>
    </swh:deposit>
    or:
    
    <swh:deposit>
      <swh:reference>
        <swh:origin url="{url}" />
      </swh:reference>
    </swh:deposit>
    
    Any invalid swhid raises a 400 bad request. If everything passes, a 201
    response with usual deposit receipt is received by the deposit user. The
    deposit row is updated with status "done", complete_date and reception_date to
    the same and current date, the swhid columns are updated as well if any.
    
    Note: This technically is reusing the existing code from the metadata update
    scenario. The current metadata update still happens the same way as before.
    
    Related to T2537

commit ce7ab89f267d84dabe032d2dcbe26f21b7e165f0
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Nov 13 14:32:05 2020 +0100

    metadata-only: Start deposit
    
    It does nothing right now but add the detection of the metadata-only deposit.
    And fails in case of invalid input.
    
    Related to T2537

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/341/ for more details.

ardumont edited the summary of this revision. (Show Details)Tue, Nov 17, 5:00 PM
ardumont edited the summary of this revision. (Show Details)Tue, Nov 17, 5:15 PM
vlorentz requested changes to this revision.Tue, Nov 17, 5:37 PM
vlorentz added a subscriber: vlorentz.
vlorentz added inline comments.
swh/deposit/parsers.py
204–207

'anchor=swh:1:snp:' is not supported when 'visit' is also provided.

This revision now requires changes to proceed.Tue, Nov 17, 5:37 PM
ardumont marked an inline comment as done.Tue, Nov 17, 5:55 PM
ardumont added inline comments.
swh/deposit/parsers.py
204–207

thanks, clearer.

ardumont updated this revision to Diff 15945.Tue, Nov 17, 6:12 PM
ardumont marked an inline comment as done.

Adapt according to review

vlorentz accepted this revision.Tue, Nov 17, 6:14 PM
This revision is now accepted and ready to land.Tue, Nov 17, 6:14 PM

Build is green

Patch application report for D4493 (id=15945)

Could not rebase; Attempt merge onto c1a45162d3...

Updating c1a45162..071650b7
Fast-forward
 swh/deposit/api/common.py                          | 145 +++++++++++-
 swh/deposit/api/deposit_update.py                  |  95 +-------
 swh/deposit/config.py                              |   5 +
 swh/deposit/parsers.py                             |  20 ++
 swh/deposit/tests/api/test_deposit_metadata.py     | 256 +++++++++++++++++++++
 swh/deposit/tests/api/test_parsers.py              |  21 +-
 swh/deposit/tests/conftest.py                      |   6 +-
 .../tests/data/atom/entry-data-with-origin.xml     |  13 ++
 .../tests/data/atom/entry-data-with-swhid.xml      |  13 ++
 swh/deposit/tests/test_utils.py                    |  61 ++++-
 swh/deposit/utils.py                               |  40 +++-
 11 files changed, 564 insertions(+), 111 deletions(-)
 create mode 100644 swh/deposit/tests/api/test_deposit_metadata.py
 create mode 100644 swh/deposit/tests/data/atom/entry-data-with-origin.xml
 create mode 100644 swh/deposit/tests/data/atom/entry-data-with-swhid.xml
Changes applied before test
commit 071650b783f764d40fed12d97633a3e6e816c7ce
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Nov 17 15:28:27 2020 +0100

    Refuse SWHID use with qualifier anchor and visit targeting both a snapshot
    
    Related to T2537

commit 85dc1f7f3d910ffff66ef00dd875794997c4b7b3
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Nov 13 18:27:23 2020 +0100

    Adapt existing POST to a collection to allow metadata-only deposit
    
    The following endpoint is adapted to allow it:
    
    POST /1/<collection>
    HEADER: Content-type: application/atom+xml;type=entry
    And the xml provided by the client contains either:
    
    <swh:deposit>
      <swh:reference>
        <swh:object swhid="{swhid}" />
      </swh:reference>
    </swh:deposit>
    or:
    
    <swh:deposit>
      <swh:reference>
        <swh:origin url="{url}" />
      </swh:reference>
    </swh:deposit>
    
    Any invalid swhid raises a 400 bad request. If everything passes, a 201
    response with usual deposit receipt is received by the deposit user. The
    deposit row is updated with status "done", complete_date and reception_date to
    the same and current date, the swhid columns are updated as well if any.
    
    Note: This technically is reusing the existing code from the metadata update
    scenario. The current metadata update still happens the same way as before.
    
    Related to T2537

commit ce7ab89f267d84dabe032d2dcbe26f21b7e165f0
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Nov 13 14:32:05 2020 +0100

    metadata-only: Start deposit
    
    It does nothing right now but add the detection of the metadata-only deposit.
    And fails in case of invalid input.
    
    Related to T2537

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/345/ for more details.

Build is green

Patch application report for D4493 (id=15948)

Could not rebase; Attempt merge onto c1a45162d3...

Updating c1a45162..546f96f4
Fast-forward
 swh/deposit/api/common.py                          | 147 +++++++++++-
 swh/deposit/api/deposit_update.py                  |  92 +-------
 swh/deposit/config.py                              |   5 +
 swh/deposit/parsers.py                             |  20 ++
 swh/deposit/tests/api/test_deposit_metadata.py     | 256 +++++++++++++++++++++
 swh/deposit/tests/api/test_parsers.py              |  21 +-
 swh/deposit/tests/conftest.py                      |   6 +-
 .../tests/data/atom/entry-data-with-origin.xml     |  13 ++
 .../tests/data/atom/entry-data-with-swhid.xml      |  13 ++
 swh/deposit/tests/test_utils.py                    |  61 ++++-
 swh/deposit/utils.py                               |  40 +++-
 11 files changed, 570 insertions(+), 104 deletions(-)
 create mode 100644 swh/deposit/tests/api/test_deposit_metadata.py
 create mode 100644 swh/deposit/tests/data/atom/entry-data-with-origin.xml
 create mode 100644 swh/deposit/tests/data/atom/entry-data-with-swhid.xml
Changes applied before test
commit 546f96f4f08200ffa48a85839b7065e2ef1f4a0c
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Nov 17 15:28:27 2020 +0100

    Refuse SWHID use with qualifier anchor and visit targeting both a snapshot
    
    Related to T2537

commit 406790effb7080befe2eef6b0c7734a751e16870
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Nov 13 14:32:05 2020 +0100

    Adapt existing POST to a collection to allow metadata-only deposit
    
    The following endpoint is adapted to allow it:
    
    POST /1/<collection>
    HEADER: Content-type: application/atom+xml;type=entry
    And the xml provided by the client contains either:
    
    <swh:deposit>
      <swh:reference>
        <swh:object swhid="{swhid}" />
      </swh:reference>
    </swh:deposit>
    or:
    
    <swh:deposit>
      <swh:reference>
        <swh:origin url="{url}" />
      </swh:reference>
    </swh:deposit>
    
    Any invalid swhid raises a 400 bad request. If everything passes, a 201
    response with usual deposit receipt is received by the deposit user. The
    deposit row is updated with status "done", complete_date and reception_date to
    the same and current date, the swhid columns are updated as well if any.
    
    Note: Technically, this is reusing the existing code from the metadata update
    scenario. The current metadata update still happens the same way as before.
    
    Related to T2537

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/347/ for more details.

Build is green

Patch application report for D4493 (id=15954)

Could not rebase; Attempt merge onto c1a45162d3...

Updating c1a45162..5d1932ce
Fast-forward
 swh/deposit/api/common.py                          | 142 ++++++++++-
 swh/deposit/api/deposit_update.py                  |  83 +-----
 swh/deposit/config.py                              |   5 +
 swh/deposit/errors.py                              |  14 ++
 swh/deposit/parsers.py                             |  20 ++
 swh/deposit/tests/api/test_deposit_metadata.py     | 277 +++++++++++++++++++++
 swh/deposit/tests/api/test_parsers.py              |  21 +-
 swh/deposit/tests/conftest.py                      |   6 +-
 .../tests/data/atom/entry-data-with-origin.xml     |  13 +
 ...-with-swhid-fail-metadata-functional-checks.xml |  12 +
 .../tests/data/atom/entry-data-with-swhid.xml      |  13 +
 swh/deposit/tests/test_utils.py                    |  61 ++++-
 swh/deposit/utils.py                               |  40 ++-
 13 files changed, 609 insertions(+), 98 deletions(-)
 create mode 100644 swh/deposit/tests/api/test_deposit_metadata.py
 create mode 100644 swh/deposit/tests/data/atom/entry-data-with-origin.xml
 create mode 100644 swh/deposit/tests/data/atom/entry-data-with-swhid-fail-metadata-functional-checks.xml
 create mode 100644 swh/deposit/tests/data/atom/entry-data-with-swhid.xml
Changes applied before test
commit 5d1932ce291cdecf78f08642ce6783845c3f616d
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Nov 17 15:28:27 2020 +0100

    Refuse SWHID use with qualifier anchor and visit targeting both a snapshot
    
    Related to T2537

commit b86840cd1e2278ebfbcfdc6d57f3384f28fea96c
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Nov 13 14:32:05 2020 +0100

    Adapt existing POST to a collection to allow metadata-only deposit
    
    The following endpoint is adapted to allow it:
    
    POST /1/<collection>
    HEADER: Content-type: application/atom+xml;type=entry
    And the xml provided by the client contains either:
    
    <swh:deposit>
      <swh:reference>
        <swh:object swhid="{swhid}" />
      </swh:reference>
    </swh:deposit>
    or:
    
    <swh:deposit>
      <swh:reference>
        <swh:origin url="{url}" />
      </swh:reference>
    </swh:deposit>
    
    Any invalid swhid raises a 400 bad request. If everything passes, a 201
    response with usual deposit receipt is received by the deposit user. The
    deposit row is updated with status "done", complete_date and reception_date to
    the same and current date, the swhid columns are updated as well if any.
    
    Note: Technically, this is reusing the existing code from the metadata update
    scenario. The current metadata update still happens the same way as before.
    
    Related to T2537

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/353/ for more details.