Page MenuHomeSoftware Heritage

Add tag <swh:add_to_origin>, to replace the Slug header for parent relationships.
ClosedPublic

Authored by vlorentz on Dec 2 2020, 12:35 PM.

Details

Summary

Computing origin urls based on provider url + slug is brittle.

Additionally, the Slug is not designed to define relationships between deposits.

Diff Detail

Event Timeline

Build is green

Patch application report for D4645 (id=16481)

Could not rebase; Attempt merge onto b7eac04c53...

Updating b7eac04c..717ee249
Fast-forward
 docs/metadata.rst                                  |  30 +++-
 docs/specs/metadata_example.xml                    |   3 +
 docs/specs/spec-loading.rst                        |   3 +-
 docs/specs/swh.xsd                                 |  13 ++
 docs/user-manual.rst                               |   8 +-
 swh/deposit/api/collection.py                      |   1 +
 swh/deposit/api/common.py                          |  80 +++++++--
 swh/deposit/tests/api/test_collection.py           | 131 ++++++++++++--
 swh/deposit/tests/api/test_collection_post_atom.py | 190 +++++++++++++++++----
 .../tests/api/test_collection_post_metadata.py     |   2 +-
 .../api/test_deposit_private_read_metadata.py      |   5 +-
 swh/deposit/tests/api/test_deposit_update.py       |  98 ++++++++++-
 swh/deposit/tests/conftest.py                      |   5 +-
 swh/deposit/tests/data/atom/codemeta-sample.xml    |   6 +
 .../tests/data/atom/entry-data-no-origin-url.xml   |  26 +++
 .../data/atom/entry-data-with-add-to-origin.xml    |  13 ++
 ...in.xml => entry-data-with-origin-reference.xml} |   0
 swh/deposit/tests/data/atom/entry-data0.xml        |   9 +-
 swh/deposit/tests/data/atom/entry-data2.xml        |   9 +-
 .../tests/data/atom/entry-only-create-origin.xml   |  10 ++
 swh/deposit/tests/data/atom/error-with-decimal.xml |  10 +-
 ...-with-external-identifier-and-create-origin.xml |  14 ++
 .../data/atom/error-with-external-identifier.xml   |   3 +-
 .../error-with-reference-and-create-origin.xml     |  16 ++
 swh/deposit/tests/data/atom/metadata.xml           |  10 +-
 25 files changed, 625 insertions(+), 70 deletions(-)
 create mode 100644 swh/deposit/tests/data/atom/entry-data-no-origin-url.xml
 create mode 100644 swh/deposit/tests/data/atom/entry-data-with-add-to-origin.xml
 rename swh/deposit/tests/data/atom/{entry-data-with-origin.xml => entry-data-with-origin-reference.xml} (100%)
 create mode 100644 swh/deposit/tests/data/atom/entry-only-create-origin.xml
 create mode 100644 swh/deposit/tests/data/atom/error-with-external-identifier-and-create-origin.xml
 create mode 100644 swh/deposit/tests/data/atom/error-with-reference-and-create-origin.xml
Changes applied before test
commit 717ee249859749d3c1be0f03978db719d0c38472
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Nov 26 13:10:15 2020 +0100

    Add tag <swh:add_to_origin>, to replace the Slug header for parent relationships.
    
    Computing origin urls based on provider url + slug is brittle.
    
    Additionally, the Slug is not designed to define relationships between deposits.

commit b75ebd1a74ae6b07bfa299c9aab74ef5d48258df
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Nov 25 19:14:26 2020 +0100

    Check origin urls start with the user's provider_url
    
    To prevent any user from creating any origin url.

commit 441cfe10a4b636ab3ed0311ceeb8e780a90a7e4f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Nov 25 19:06:33 2020 +0100

    Add tag <swh:create_origin>, to replace the Slug header.
    
    Computing origin urls based on provider url + slug is brittle.
    
    Additionally, it meant basing parenting relationship on reusing
    values of the slug across deposit, which is clearly not in the
    spirit of the SWORD specification.

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/468/ for more details.

Couple of remarks, questions inline

swh/deposit/api/common.py
756

missing test scenario

791

same, missing scnenario

swh/deposit/tests/data/atom/entry-data-with-add-to-origin.xml
10

I'm a bit surprised here, do we rename from swh:reference to swh:add_to_origin?

that file with swh:reference is used for the metadata only deposit...

swh/deposit/tests/data/atom/entry-data-with-add-to-origin.xml
10

I blame Phabricator. I copied the file, not move it. See https://forge.softwareheritage.org/D4645#116053

ardumont requested changes to this revision.Dec 2 2020, 7:14 PM

Missing test cases but otherwise, lgtm.

Request changes for the sake of those missing scenarios.

swh/deposit/api/collection.py
113–114

So as far as i could tell, for now, we keep the retro-compatibilty, right?

swh/deposit/tests/data/atom/entry-data-with-add-to-origin.xml
10

wow, yeah, that's confusing.

Thanks for lifting the doubt, i did not think of checking the diff stats.

(And also, I asked because i thought i missed something earlier...)

ok then.

This revision now requires changes to proceed.Dec 2 2020, 7:14 PM
swh/deposit/api/collection.py
113–114

yes

swh/deposit/tests/data/atom/entry-data-with-add-to-origin.xml
10

it confused me as well, as I forgot about that change.

Build is green

Patch application report for D4645 (id=16591)

Could not rebase; Attempt merge onto b7eac04c53...

Updating b7eac04c..b66cfff9
Fast-forward
 docs/metadata.rst                                  |  30 ++-
 docs/specs/metadata_example.xml                    |   3 +
 docs/specs/spec-loading.rst                        |   3 +-
 docs/specs/swh.xsd                                 |  13 ++
 docs/user-manual.rst                               |   8 +-
 swh/deposit/api/collection.py                      |   1 +
 swh/deposit/api/common.py                          |  80 ++++++-
 swh/deposit/tests/api/test_collection.py           | 131 +++++++++++-
 swh/deposit/tests/api/test_collection_post_atom.py | 234 ++++++++++++++++++---
 .../tests/api/test_collection_post_metadata.py     |   2 +-
 .../api/test_deposit_private_read_metadata.py      |   5 +-
 swh/deposit/tests/api/test_deposit_update.py       |  98 ++++++++-
 swh/deposit/tests/conftest.py                      |   5 +-
 swh/deposit/tests/data/atom/codemeta-sample.xml    |   6 +
 .../tests/data/atom/entry-data-no-origin-url.xml   |  26 +++
 .../data/atom/entry-data-with-add-to-origin.xml    |  13 ++
 ...ata-with-both-add-to-origin-and-external-id.xml |  14 ++
 ...a-with-both-create-origin-and-add-to-origin.xml |  16 ++
 ...in.xml => entry-data-with-origin-reference.xml} |   0
 swh/deposit/tests/data/atom/entry-data0.xml        |   9 +-
 swh/deposit/tests/data/atom/entry-data2.xml        |   9 +-
 .../tests/data/atom/entry-only-create-origin.xml   |  10 +
 swh/deposit/tests/data/atom/error-with-decimal.xml |  10 +-
 ...-with-external-identifier-and-create-origin.xml |  14 ++
 .../data/atom/error-with-external-identifier.xml   |   3 +-
 .../error-with-reference-and-create-origin.xml     |  16 ++
 swh/deposit/tests/data/atom/metadata.xml           |  10 +-
 27 files changed, 699 insertions(+), 70 deletions(-)
 create mode 100644 swh/deposit/tests/data/atom/entry-data-no-origin-url.xml
 create mode 100644 swh/deposit/tests/data/atom/entry-data-with-add-to-origin.xml
 create mode 100644 swh/deposit/tests/data/atom/entry-data-with-both-add-to-origin-and-external-id.xml
 create mode 100644 swh/deposit/tests/data/atom/entry-data-with-both-create-origin-and-add-to-origin.xml
 rename swh/deposit/tests/data/atom/{entry-data-with-origin.xml => entry-data-with-origin-reference.xml} (100%)
 create mode 100644 swh/deposit/tests/data/atom/entry-only-create-origin.xml
 create mode 100644 swh/deposit/tests/data/atom/error-with-external-identifier-and-create-origin.xml
 create mode 100644 swh/deposit/tests/data/atom/error-with-reference-and-create-origin.xml
Changes applied before test
commit b66cfff9e8d59156155140b8a61bc72558a9558b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Nov 26 13:10:15 2020 +0100

    Add tag <swh:add_to_origin>, to replace the Slug header for parent relationships.
    
    Computing origin urls based on provider url + slug is brittle.
    
    Additionally, the Slug is not designed to define relationships between deposits.

commit b75ebd1a74ae6b07bfa299c9aab74ef5d48258df
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Nov 25 19:14:26 2020 +0100

    Check origin urls start with the user's provider_url
    
    To prevent any user from creating any origin url.

commit 441cfe10a4b636ab3ed0311ceeb8e780a90a7e4f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Nov 25 19:06:33 2020 +0100

    Add tag <swh:create_origin>, to replace the Slug header.
    
    Computing origin urls based on provider url + slug is brittle.
    
    Additionally, it meant basing parenting relationship on reusing
    values of the slug across deposit, which is clearly not in the
    spirit of the SWORD specification.

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/471/ for more details.

This revision is now accepted and ready to land.Dec 7 2020, 6:01 PM