Page MenuHomeSoftware Heritage

Add tag <swh:create_origin>, to replace the Slug header.
ClosedPublic

Authored by vlorentz on Nov 26 2020, 4:20 PM.

Details

Summary

Computing origin urls based on provider url + slug is brittle.

Additionally, it meant basing parenting relationship on reusing
values of the slug across deposit, which is clearly not in the
spirit of the SWORD specification.

Diff Detail

Unit TestsFailed

TimeTest
237 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.deposit.tests.api.test_collection_post_atom::test_post_deposit_atom_with_create_origin_and_reference
authenticated_client = <rest_framework.test.APIClient object at 0x7f4fdd3ead68> deposit_collection = <DepositCollection: {'id': 20, 'name': 'test'}> atom_dataset = {'codemeta-sample': '<?xml version="1.0"?>\n <entry xmlns="http://www.w3.org/2005/Atom"\n xmlns:d...ntry>\n', 'entry-data-empty-body': '<?xml version="1.0"?>\n<entry xmlns="http://www.w3.org/2005/Atom"></entry>\n', ...}
238 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.deposit.tests.api.test_deposit_update::test_put_atom_with_create_origin_and_reference
authenticated_client = <rest_framework.test.APIClient object at 0x7f4fdce623c8> deposit_collection = <DepositCollection: {'id': 119, 'name': 'test'}> atom_dataset = {'codemeta-sample': '<?xml version="1.0"?>\n <entry xmlns="http://www.w3.org/2005/Atom"\n xmlns:d...ntry>\n', 'entry-data-empty-body': '<?xml version="1.0"?>\n<entry xmlns="http://www.w3.org/2005/Atom"></entry>\n', ...}
82 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.deposit.tests.api.test_checks::test_api_checks_check_metadata_ko[metadata_ko0-expected_summary0]
86 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.deposit.tests.api.test_checks::test_api_checks_check_metadata_ko[metadata_ko1-expected_summary1]
96 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.deposit.tests.api.test_checks::test_api_checks_check_metadata_ok[metadata_ok0]
View Full Test Results (2 Failed · 217 Passed)

Event Timeline

Build has FAILED

Patch application report for D4611 (id=16354)

Could not rebase; Attempt merge onto ad6041d312...

Updating ad6041d3..fb419cde
Fast-forward
 docs/endpoints/collection.rst                      |  2 -
 docs/metadata.rst                                  | 30 ++++++-
 docs/specs/metadata_example.xml                    |  3 +
 docs/specs/spec-loading.rst                        |  6 +-
 docs/specs/swh.xsd                                 | 41 +++++++---
 docs/user-manual.rst                               |  8 +-
 swh/deposit/api/collection.py                      | 12 +--
 swh/deposit/api/common.py                          | 79 +++++++++++--------
 swh/deposit/api/state.py                           |  3 +-
 swh/deposit/cli/client.py                          | 16 ++--
 swh/deposit/client.py                              | 19 +++--
 swh/deposit/errors.py                              | 15 ----
 .../0021_deposit_origin_url_20201124_1438.py       | 28 +++++++
 swh/deposit/models.py                              |  9 +--
 .../templates/deposit/{status.xml => state.xml}    |  1 +
 swh/deposit/tests/api/test_collection.py           | 35 +++++---
 swh/deposit/tests/api/test_collection_post_atom.py | 92 +++++++++++++++-------
 .../tests/api/test_collection_post_binary.py       | 18 ++++-
 .../tests/api/test_collection_post_multipart.py    | 18 ++++-
 .../api/test_deposit_private_read_metadata.py      | 10 +--
 ...est_deposit_status.py => test_deposit_state.py} |  2 +
 swh/deposit/tests/api/test_deposit_update.py       | 16 +++-
 swh/deposit/tests/cli/test_client.py               |  8 +-
 swh/deposit/tests/conftest.py                      |  5 +-
 swh/deposit/tests/data/atom/codemeta-sample.xml    |  6 ++
 .../tests/data/atom/entry-data-no-origin-url.xml   | 26 ++++++
 swh/deposit/tests/data/atom/entry-data0.xml        |  9 ++-
 swh/deposit/tests/data/atom/entry-data2.xml        |  9 ++-
 .../tests/data/atom/entry-only-create-origin.xml   | 10 +++
 swh/deposit/tests/data/atom/error-with-decimal.xml | 10 ++-
 swh/deposit/tests/data/atom/metadata.xml           | 10 ++-
 swh/deposit/urls.py                                |  4 +-
 32 files changed, 393 insertions(+), 167 deletions(-)
 create mode 100644 swh/deposit/migrations/0021_deposit_origin_url_20201124_1438.py
 rename swh/deposit/templates/deposit/{status.xml => state.xml} (92%)
 rename swh/deposit/tests/api/{test_deposit_status.py => test_deposit_state.py} (97%)
 create mode 100644 swh/deposit/tests/data/atom/entry-data-no-origin-url.xml
 create mode 100644 swh/deposit/tests/data/atom/entry-only-create-origin.xml
Changes applied before test
commit fb419cde5f9029e916815d37f568956571363918
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Nov 25 19:06:33 2020 +0100

    Add tag <swh:create_origin>, to replace the Slug header.
    
    Computing origin urls based on provider url + slug is brittle.
    
    Additionally, it meant basing parenting relationship on reusing
    values of the slug across deposit, which is clearly not in the
    spirit of the SWORD specification.

commit 4da9d03bb2f2a00fb25c1e12ce1e51c9c917e7ec
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Nov 25 19:20:06 2020 +0100

    rename status -> state for consistency with SWORD and the view name.

commit 2f5aae7e9c68937e2d4baa3080e916da2b76dde5
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Nov 25 19:18:26 2020 +0100

    Return the origin url on 'GET State-IRI'.

commit cb09d873dbe8a8037f34b3322bf83209d85ca706
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Nov 25 19:32:12 2020 +0100

    Fix mypy error.
    
    The 'type: ignore' comment makes errors on my machine,
    but removing it errors on Jenkins.
    So I'm fixing it by removing the name clash.

commit f863ea93a22b3132749261d479f9d2a8117d52bd
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Nov 26 14:45:48 2020 +0100

    Don't generate slugs on the client side
    
    They are not mandatory anymore.

commit fa3a3591e6932e16c13890f45b1bd26296954e74
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Nov 25 17:49:13 2020 +0100

    Make the Slug header optional.
    
    To follow the SWORD specification.

commit 45b596f198df963edb2d6f5353763704b16fd05d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Nov 25 15:50:47 2020 +0100

    Defer origin_url computation until the deposit completes.
    
    In a future commit, the origin_url computation will only be a fallback
    in case it's not provided in any of the Atom entries; so we have
    to compute this fallback after all Atom entries are given, ie. when
    the deposit completes.

commit 1a2da5e1805f7d65cdd6c22b6180ad6fb00d623a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Nov 24 15:52:36 2020 +0100

    add column 'origin_url' on Deposit.
    
    For now, it's only computed based on the external_id, but clients
    will provide it in future versions.

commit 2e4a524a2480dbd3baa6f59390528fd694ea7390
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Nov 24 12:20:19 2020 +0100

    swh.xsd: Use the https://www.softwareheritage.org/schema/2018/deposit namespace

commit 88b102aec44185b97b06135d862821bb4c138acc
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Nov 24 12:11:31 2020 +0100

    swh.xsd: Improve readability
    
    This also moves the inner element definitions of <swh:reference> to
    root elements, so they can be reused by new definitions in the future.

Link to build: https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/449/
See console output for more information: https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/449/console

Build is green

Patch application report for D4611 (id=16383)

Could not rebase; Attempt merge onto ad6041d312...

Updating ad6041d3..60f66cdb
Fast-forward
 docs/endpoints/collection.rst                      |  2 -
 docs/metadata.rst                                  | 30 ++++++-
 docs/specs/metadata_example.xml                    |  3 +
 docs/specs/spec-loading.rst                        |  6 +-
 docs/specs/swh.xsd                                 | 41 +++++++---
 docs/user-manual.rst                               |  8 +-
 swh/deposit/api/collection.py                      | 12 +--
 swh/deposit/api/common.py                          | 79 +++++++++++--------
 swh/deposit/api/state.py                           |  3 +-
 swh/deposit/cli/client.py                          | 16 ++--
 swh/deposit/client.py                              | 19 +++--
 swh/deposit/errors.py                              | 15 ----
 .../0021_deposit_origin_url_20201124_1438.py       | 28 +++++++
 swh/deposit/models.py                              |  9 +--
 .../templates/deposit/{status.xml => state.xml}    |  1 +
 swh/deposit/tests/api/test_collection.py           | 35 +++++---
 swh/deposit/tests/api/test_collection_post_atom.py | 92 +++++++++++++++-------
 .../tests/api/test_collection_post_binary.py       | 18 ++++-
 .../tests/api/test_collection_post_multipart.py    | 18 ++++-
 .../api/test_deposit_private_read_metadata.py      | 10 +--
 ...est_deposit_status.py => test_deposit_state.py} |  2 +
 swh/deposit/tests/api/test_deposit_update.py       | 16 +++-
 swh/deposit/tests/cli/test_client.py               |  8 +-
 swh/deposit/tests/conftest.py                      |  5 +-
 swh/deposit/tests/data/atom/codemeta-sample.xml    |  6 ++
 .../tests/data/atom/entry-data-no-origin-url.xml   | 26 ++++++
 swh/deposit/tests/data/atom/entry-data0.xml        |  9 ++-
 swh/deposit/tests/data/atom/entry-data2.xml        |  9 ++-
 .../tests/data/atom/entry-only-create-origin.xml   | 10 +++
 swh/deposit/tests/data/atom/error-with-decimal.xml | 10 ++-
 swh/deposit/tests/data/atom/metadata.xml           | 10 ++-
 tox.ini                                            |  2 +-
 32 files changed, 392 insertions(+), 166 deletions(-)
 create mode 100644 swh/deposit/migrations/0021_deposit_origin_url_20201124_1438.py
 rename swh/deposit/templates/deposit/{status.xml => state.xml} (92%)
 rename swh/deposit/tests/api/{test_deposit_status.py => test_deposit_state.py} (97%)
 create mode 100644 swh/deposit/tests/data/atom/entry-data-no-origin-url.xml
 create mode 100644 swh/deposit/tests/data/atom/entry-only-create-origin.xml
Changes applied before test
commit 60f66cdb9e9f93d3040e794edefdf5d35fc3cf59
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Nov 25 19:06:33 2020 +0100

    Add tag <swh:create_origin>, to replace the Slug header.
    
    Computing origin urls based on provider url + slug is brittle.
    
    Additionally, it meant basing parenting relationship on reusing
    values of the slug across deposit, which is clearly not in the
    spirit of the SWORD specification.

commit b7eac04c53d6722826c338c8ed711a8a35d5b497
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Nov 25 19:20:06 2020 +0100

    rename status -> state for consistency with SWORD and the view name.

commit 4359a90e3a688aeb11a719e5e4c96cd2aa1ed32e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Nov 25 19:18:26 2020 +0100

    Return the origin url on 'GET State-IRI'.

commit f02b7f71fba1aeb6c3fdc1c7eb4e9118e13de2a0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Nov 27 11:11:12 2020 +0100

    fix mypy error in swh/deposit/urls.py
    
    If djangorestframework-stubs does not contain the stub for
    `format_suffix_patterns`, then mypy does not see any issue on passing it
    a `List[object]` as argument, so the `# type: ignore` is useless,
    so mypy complains about it.

commit 5989af38c9e6b3d5e32e20bd125aa149d6668e8a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Nov 26 14:45:48 2020 +0100

    Don't generate slugs on the client side
    
    They are not mandatory anymore.

commit fc5a5ee0b1c67c90c3eaf5a924a6c46eb6fb0f7f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Nov 25 17:49:13 2020 +0100

    Make the Slug header optional.
    
    To follow the SWORD specification.

commit e7316d6d8a527eb4a96eb2e9d46daa0045c6b851
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Nov 25 15:50:47 2020 +0100

    Defer origin_url computation until the deposit completes.
    
    In a future commit, the origin_url computation will only be a fallback
    in case it's not provided in any of the Atom entries; so we have
    to compute this fallback after all Atom entries are given, ie. when
    the deposit completes.

commit a21961284e3b88a3c21fc0b67cae793351331e7f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Nov 24 15:52:36 2020 +0100

    add column 'origin_url' on Deposit.
    
    For now, it's only computed based on the external_id, but clients
    will provide it in future versions.

commit 2e4a524a2480dbd3baa6f59390528fd694ea7390
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Nov 24 12:20:19 2020 +0100

    swh.xsd: Use the https://www.softwareheritage.org/schema/2018/deposit namespace

commit 88b102aec44185b97b06135d862821bb4c138acc
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Nov 24 12:11:31 2020 +0100

    swh.xsd: Improve readability
    
    This also moves the inner element definitions of <swh:reference> to
    root elements, so they can be reused by new definitions in the future.

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/460/ for more details.

Missing a couple of test cases inline.

Otherwise, lgtm.


I'm starting to wonder:

  • when shall we stop the refactoring?
  • when do do we deploy?
  • what does that entail for deposit clients?
swh/deposit/api/common.py
745

stops using

749

missing test case.

777

missing test case.

swh/deposit/tests/api/test_collection_post_atom.py
99

test name is wrong from the start, we pass a slug header here.
Can you please change it?

128

nor_slug_ is clearer

for the sake of missing test cases.

This revision now requires changes to proceed.Nov 27 2020, 1:36 PM
  • when shall we stop the refactoring?

wdym?

  • when do do we deploy?

Whenever you want, that's not a breaking change

  • what does that entail for deposit clients?

When I'm done with my protocol changes (which are all additions, so no breakage), we will tell them to start using them, wait for them to do it, then remove the deprecated bits.

swh/deposit/api/common.py
745

Either stopped using or stop using is fine imo. It just depends where on the timeline you are (I'm writing this for people reading it in the future :p)

swh/deposit/api/common.py
745

heh, ok then.

wdym?

i mean just that. When shall we stop?

The more the moving parts the more intertwined the issues become to disentangle (if any that is) when deploying...

Feel free to deploy at any point, all of this is incremental and shouldn't break anything.

As for the moving parts, we'll have to ask the clients when we can remove the deprecated behaviors

vlorentz marked 6 inline comments as done.

add test + rename

Build has FAILED

Patch application report for D4611 (id=16404)

Rebasing onto b7eac04c53...

Current branch diff-target is up to date.
Changes applied before test
commit 6fa45ca5fde899caf836f8db7ed99a4624c01971
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Nov 25 19:06:33 2020 +0100

    Add tag <swh:create_origin>, to replace the Slug header.
    
    Computing origin urls based on provider url + slug is brittle.
    
    Additionally, it meant basing parenting relationship on reusing
    values of the slug across deposit, which is clearly not in the
    spirit of the SWORD specification.

Link to build: https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/463/
See console output for more information: https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/463/console

Build is green

Patch application report for D4611 (id=16409)

Rebasing onto b7eac04c53...

Current branch diff-target is up to date.
Changes applied before test
commit 7bcfe1da9774ca986b0ca1cd0fd2c01444d48400
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Nov 25 19:06:33 2020 +0100

    Add tag <swh:create_origin>, to replace the Slug header.
    
    Computing origin urls based on provider url + slug is brittle.
    
    Additionally, it meant basing parenting relationship on reusing
    values of the slug across deposit, which is clearly not in the
    spirit of the SWORD specification.

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/464/ for more details.

swh/deposit/tests/api/test_collection_post_atom.py
99

ping ^

vlorentz added inline comments.
swh/deposit/tests/api/test_collection_post_atom.py
99

oops, I applied the change to the wrong commit

Build is green

Patch application report for D4611 (id=16476)

Rebasing onto b7eac04c53...

Current branch diff-target is up to date.
Changes applied before test
commit 441cfe10a4b636ab3ed0311ceeb8e780a90a7e4f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Nov 25 19:06:33 2020 +0100

    Add tag <swh:create_origin>, to replace the Slug header.
    
    Computing origin urls based on provider url + slug is brittle.
    
    Additionally, it meant basing parenting relationship on reusing
    values of the slug across deposit, which is clearly not in the
    spirit of the SWORD specification.

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/466/ for more details.

This revision is now accepted and ready to land.Dec 2 2020, 4:24 PM
This revision was automatically updated to reflect the committed changes.
vlorentz marked an inline comment as done.