Page MenuHomeSoftware Heritage

Check origin urls start with the user's provider_url
ClosedPublic

Authored by vlorentz on Nov 26 2020, 4:20 PM.

Details

Summary

To prevent any user from creating any origin url.

Diff Detail

Event Timeline

Build has FAILED

Patch application report for D4612 (id=16355)

Could not rebase; Attempt merge onto ad6041d312...

Updating ad6041d3..fc6f76da
Fast-forward
 docs/endpoints/collection.rst                      |   2 -
 docs/metadata.rst                                  |  30 +++++-
 docs/specs/metadata_example.xml                    |   3 +
 docs/specs/spec-loading.rst                        |   6 +-
 docs/specs/swh.xsd                                 |  41 +++++---
 docs/user-manual.rst                               |   8 +-
 swh/deposit/api/collection.py                      |  12 +--
 swh/deposit/api/common.py                          |  87 ++++++++++------
 swh/deposit/api/state.py                           |   3 +-
 swh/deposit/cli/client.py                          |  16 ++-
 swh/deposit/client.py                              |  19 ++--
 swh/deposit/errors.py                              |  15 ---
 .../0021_deposit_origin_url_20201124_1438.py       |  28 +++++
 swh/deposit/models.py                              |   9 +-
 .../templates/deposit/{status.xml => state.xml}    |   1 +
 swh/deposit/tests/api/test_collection.py           |  35 +++++--
 swh/deposit/tests/api/test_collection_post_atom.py | 116 +++++++++++++++------
 .../tests/api/test_collection_post_binary.py       |  18 +++-
 .../tests/api/test_collection_post_multipart.py    |  18 +++-
 .../api/test_deposit_private_read_metadata.py      |  10 +-
 ...est_deposit_status.py => test_deposit_state.py} |   2 +
 swh/deposit/tests/api/test_deposit_update.py       |  16 ++-
 swh/deposit/tests/cli/test_client.py               |   8 +-
 swh/deposit/tests/conftest.py                      |   5 +-
 swh/deposit/tests/data/atom/codemeta-sample.xml    |   6 ++
 .../tests/data/atom/entry-data-no-origin-url.xml   |  26 +++++
 swh/deposit/tests/data/atom/entry-data0.xml        |   9 +-
 swh/deposit/tests/data/atom/entry-data2.xml        |   9 +-
 .../tests/data/atom/entry-only-create-origin.xml   |  10 ++
 swh/deposit/tests/data/atom/error-with-decimal.xml |  10 +-
 swh/deposit/tests/data/atom/metadata.xml           |  10 +-
 swh/deposit/urls.py                                |   4 +-
 32 files changed, 424 insertions(+), 168 deletions(-)
 create mode 100644 swh/deposit/migrations/0021_deposit_origin_url_20201124_1438.py
 rename swh/deposit/templates/deposit/{status.xml => state.xml} (92%)
 rename swh/deposit/tests/api/{test_deposit_status.py => test_deposit_state.py} (97%)
 create mode 100644 swh/deposit/tests/data/atom/entry-data-no-origin-url.xml
 create mode 100644 swh/deposit/tests/data/atom/entry-only-create-origin.xml
Changes applied before test
commit fc6f76daf5adf0ab809fb8cccc7983704bb08a8d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Nov 25 19:14:26 2020 +0100

    Check origin urls start with the user's provider_url
    
    To prevent any user from creating any origin url.

commit fb419cde5f9029e916815d37f568956571363918
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Nov 25 19:06:33 2020 +0100

    Add tag <swh:create_origin>, to replace the Slug header.
    
    Computing origin urls based on provider url + slug is brittle.
    
    Additionally, it meant basing parenting relationship on reusing
    values of the slug across deposit, which is clearly not in the
    spirit of the SWORD specification.

commit 4da9d03bb2f2a00fb25c1e12ce1e51c9c917e7ec
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Nov 25 19:20:06 2020 +0100

    rename status -> state for consistency with SWORD and the view name.

commit 2f5aae7e9c68937e2d4baa3080e916da2b76dde5
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Nov 25 19:18:26 2020 +0100

    Return the origin url on 'GET State-IRI'.

commit cb09d873dbe8a8037f34b3322bf83209d85ca706
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Nov 25 19:32:12 2020 +0100

    Fix mypy error.
    
    The 'type: ignore' comment makes errors on my machine,
    but removing it errors on Jenkins.
    So I'm fixing it by removing the name clash.

commit f863ea93a22b3132749261d479f9d2a8117d52bd
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Nov 26 14:45:48 2020 +0100

    Don't generate slugs on the client side
    
    They are not mandatory anymore.

commit fa3a3591e6932e16c13890f45b1bd26296954e74
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Nov 25 17:49:13 2020 +0100

    Make the Slug header optional.
    
    To follow the SWORD specification.

commit 45b596f198df963edb2d6f5353763704b16fd05d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Nov 25 15:50:47 2020 +0100

    Defer origin_url computation until the deposit completes.
    
    In a future commit, the origin_url computation will only be a fallback
    in case it's not provided in any of the Atom entries; so we have
    to compute this fallback after all Atom entries are given, ie. when
    the deposit completes.

commit 1a2da5e1805f7d65cdd6c22b6180ad6fb00d623a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Nov 24 15:52:36 2020 +0100

    add column 'origin_url' on Deposit.
    
    For now, it's only computed based on the external_id, but clients
    will provide it in future versions.

commit 2e4a524a2480dbd3baa6f59390528fd694ea7390
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Nov 24 12:20:19 2020 +0100

    swh.xsd: Use the https://www.softwareheritage.org/schema/2018/deposit namespace

commit 88b102aec44185b97b06135d862821bb4c138acc
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Nov 24 12:11:31 2020 +0100

    swh.xsd: Improve readability
    
    This also moves the inner element definitions of <swh:reference> to
    root elements, so they can be reused by new definitions in the future.

Link to build: https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/450/
See console output for more information: https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/450/console

Build is green

Patch application report for D4612 (id=16384)

Could not rebase; Attempt merge onto ad6041d312...

Updating ad6041d3..613ccf4c
Fast-forward
 docs/endpoints/collection.rst                      |   2 -
 docs/metadata.rst                                  |  30 +++++-
 docs/specs/metadata_example.xml                    |   3 +
 docs/specs/spec-loading.rst                        |   6 +-
 docs/specs/swh.xsd                                 |  41 +++++---
 docs/user-manual.rst                               |   8 +-
 swh/deposit/api/collection.py                      |  12 +--
 swh/deposit/api/common.py                          |  87 ++++++++++------
 swh/deposit/api/state.py                           |   3 +-
 swh/deposit/cli/client.py                          |  16 ++-
 swh/deposit/client.py                              |  19 ++--
 swh/deposit/errors.py                              |  15 ---
 .../0021_deposit_origin_url_20201124_1438.py       |  28 +++++
 swh/deposit/models.py                              |   9 +-
 .../templates/deposit/{status.xml => state.xml}    |   1 +
 swh/deposit/tests/api/test_collection.py           |  35 +++++--
 swh/deposit/tests/api/test_collection_post_atom.py | 116 +++++++++++++++------
 .../tests/api/test_collection_post_binary.py       |  18 +++-
 .../tests/api/test_collection_post_multipart.py    |  18 +++-
 .../api/test_deposit_private_read_metadata.py      |  10 +-
 ...est_deposit_status.py => test_deposit_state.py} |   2 +
 swh/deposit/tests/api/test_deposit_update.py       |  16 ++-
 swh/deposit/tests/cli/test_client.py               |   8 +-
 swh/deposit/tests/conftest.py                      |   5 +-
 swh/deposit/tests/data/atom/codemeta-sample.xml    |   6 ++
 .../tests/data/atom/entry-data-no-origin-url.xml   |  26 +++++
 swh/deposit/tests/data/atom/entry-data0.xml        |   9 +-
 swh/deposit/tests/data/atom/entry-data2.xml        |   9 +-
 .../tests/data/atom/entry-only-create-origin.xml   |  10 ++
 swh/deposit/tests/data/atom/error-with-decimal.xml |  10 +-
 swh/deposit/tests/data/atom/metadata.xml           |  10 +-
 tox.ini                                            |   2 +-
 32 files changed, 423 insertions(+), 167 deletions(-)
 create mode 100644 swh/deposit/migrations/0021_deposit_origin_url_20201124_1438.py
 rename swh/deposit/templates/deposit/{status.xml => state.xml} (92%)
 rename swh/deposit/tests/api/{test_deposit_status.py => test_deposit_state.py} (97%)
 create mode 100644 swh/deposit/tests/data/atom/entry-data-no-origin-url.xml
 create mode 100644 swh/deposit/tests/data/atom/entry-only-create-origin.xml
Changes applied before test
commit 613ccf4c3eea782d49be0d760984922eec8e0a1b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Nov 25 19:14:26 2020 +0100

    Check origin urls start with the user's provider_url
    
    To prevent any user from creating any origin url.

commit 60f66cdb9e9f93d3040e794edefdf5d35fc3cf59
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Nov 25 19:06:33 2020 +0100

    Add tag <swh:create_origin>, to replace the Slug header.
    
    Computing origin urls based on provider url + slug is brittle.
    
    Additionally, it meant basing parenting relationship on reusing
    values of the slug across deposit, which is clearly not in the
    spirit of the SWORD specification.

commit b7eac04c53d6722826c338c8ed711a8a35d5b497
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Nov 25 19:20:06 2020 +0100

    rename status -> state for consistency with SWORD and the view name.

commit 4359a90e3a688aeb11a719e5e4c96cd2aa1ed32e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Nov 25 19:18:26 2020 +0100

    Return the origin url on 'GET State-IRI'.

commit f02b7f71fba1aeb6c3fdc1c7eb4e9118e13de2a0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Nov 27 11:11:12 2020 +0100

    fix mypy error in swh/deposit/urls.py
    
    If djangorestframework-stubs does not contain the stub for
    `format_suffix_patterns`, then mypy does not see any issue on passing it
    a `List[object]` as argument, so the `# type: ignore` is useless,
    so mypy complains about it.

commit 5989af38c9e6b3d5e32e20bd125aa149d6668e8a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Nov 26 14:45:48 2020 +0100

    Don't generate slugs on the client side
    
    They are not mandatory anymore.

commit fc5a5ee0b1c67c90c3eaf5a924a6c46eb6fb0f7f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Nov 25 17:49:13 2020 +0100

    Make the Slug header optional.
    
    To follow the SWORD specification.

commit e7316d6d8a527eb4a96eb2e9d46daa0045c6b851
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Nov 25 15:50:47 2020 +0100

    Defer origin_url computation until the deposit completes.
    
    In a future commit, the origin_url computation will only be a fallback
    in case it's not provided in any of the Atom entries; so we have
    to compute this fallback after all Atom entries are given, ie. when
    the deposit completes.

commit a21961284e3b88a3c21fc0b67cae793351331e7f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Nov 24 15:52:36 2020 +0100

    add column 'origin_url' on Deposit.
    
    For now, it's only computed based on the external_id, but clients
    will provide it in future versions.

commit 2e4a524a2480dbd3baa6f59390528fd694ea7390
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Nov 24 12:20:19 2020 +0100

    swh.xsd: Use the https://www.softwareheritage.org/schema/2018/deposit namespace

commit 88b102aec44185b97b06135d862821bb4c138acc
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Nov 24 12:11:31 2020 +0100

    swh.xsd: Improve readability
    
    This also moves the inner element definitions of <swh:reference> to
    root elements, so they can be reused by new definitions in the future.

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/461/ for more details.

ardumont added a subscriber: ardumont.

lgtm

a small remark on the complete log message inline though.

swh/deposit/tests/api/test_collection_post_atom.py
138

some recent discussion about testing the full error message being not DRY occur recently.
So, in that regards, the following assertion should be enough:

assert "Cannot create origin" in response.content.decode()
This revision is now accepted and ready to land.Nov 27 2020, 1:53 PM
swh/deposit/tests/api/test_collection_post_atom.py
138

No, that would not be a good test, because the error could be "Cannot create origin because example.org is reserved".

I could test both "Cannot create origin" and "it must start with" are in the error, but that's barely different from my test...

Build has FAILED

Patch application report for D4612 (id=16403)

Could not rebase; Attempt merge onto b7eac04c53...

Updating b7eac04c..a8d9f22e
Fast-forward
 docs/metadata.rst                                  |  30 +++-
 docs/specs/metadata_example.xml                    |   3 +
 docs/specs/spec-loading.rst                        |   3 +-
 docs/specs/swh.xsd                                 |   6 +
 docs/user-manual.rst                               |   8 +-
 swh/deposit/api/common.py                          |  54 +++++--
 swh/deposit/tests/api/test_collection.py           |  35 +++--
 swh/deposit/tests/api/test_collection_post_atom.py | 163 +++++++++++++++++----
 .../api/test_deposit_private_read_metadata.py      |   5 +-
 swh/deposit/tests/api/test_deposit_update.py       |  98 ++++++++++++-
 swh/deposit/tests/conftest.py                      |   5 +-
 swh/deposit/tests/data/atom/codemeta-sample.xml    |   6 +
 .../tests/data/atom/entry-data-no-origin-url.xml   |  26 ++++
 swh/deposit/tests/data/atom/entry-data0.xml        |   9 +-
 swh/deposit/tests/data/atom/entry-data2.xml        |   9 +-
 .../tests/data/atom/entry-only-create-origin.xml   |  10 ++
 swh/deposit/tests/data/atom/error-with-decimal.xml |  10 +-
 ...-with-external-identifier-and-create-origin.xml |  14 ++
 .../data/atom/error-with-external-identifier.xml   |   3 +-
 .../error-with-reference-and-create-origin.xml     |  16 ++
 swh/deposit/tests/data/atom/metadata.xml           |  10 +-
 21 files changed, 453 insertions(+), 70 deletions(-)
 create mode 100644 swh/deposit/tests/data/atom/entry-data-no-origin-url.xml
 create mode 100644 swh/deposit/tests/data/atom/entry-only-create-origin.xml
 create mode 100644 swh/deposit/tests/data/atom/error-with-external-identifier-and-create-origin.xml
 create mode 100644 swh/deposit/tests/data/atom/error-with-reference-and-create-origin.xml
Changes applied before test
commit a8d9f22ea76d197dfec8d2911276712389886f1c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Nov 25 19:14:26 2020 +0100

    Check origin urls start with the user's provider_url
    
    To prevent any user from creating any origin url.

commit 6fa45ca5fde899caf836f8db7ed99a4624c01971
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Nov 25 19:06:33 2020 +0100

    Add tag <swh:create_origin>, to replace the Slug header.
    
    Computing origin urls based on provider url + slug is brittle.
    
    Additionally, it meant basing parenting relationship on reusing
    values of the slug across deposit, which is clearly not in the
    spirit of the SWORD specification.

Link to build: https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/462/
See console output for more information: https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/462/console

Build is green

Patch application report for D4612 (id=16410)

Could not rebase; Attempt merge onto b7eac04c53...

Updating b7eac04c..3ad4d18d
Fast-forward
 docs/metadata.rst                                  |  30 +++-
 docs/specs/metadata_example.xml                    |   3 +
 docs/specs/spec-loading.rst                        |   3 +-
 docs/specs/swh.xsd                                 |   6 +
 docs/user-manual.rst                               |   8 +-
 swh/deposit/api/common.py                          |  56 +++++--
 swh/deposit/tests/api/test_collection.py           |  35 +++--
 swh/deposit/tests/api/test_collection_post_atom.py | 163 +++++++++++++++++----
 .../api/test_deposit_private_read_metadata.py      |   5 +-
 swh/deposit/tests/api/test_deposit_update.py       |  98 ++++++++++++-
 swh/deposit/tests/conftest.py                      |   5 +-
 swh/deposit/tests/data/atom/codemeta-sample.xml    |   6 +
 .../tests/data/atom/entry-data-no-origin-url.xml   |  26 ++++
 .../data/atom/entry-data-with-origin-reference.xml |  13 ++
 swh/deposit/tests/data/atom/entry-data0.xml        |   9 +-
 swh/deposit/tests/data/atom/entry-data2.xml        |   9 +-
 .../tests/data/atom/entry-only-create-origin.xml   |  10 ++
 swh/deposit/tests/data/atom/error-with-decimal.xml |  10 +-
 ...-with-external-identifier-and-create-origin.xml |  14 ++
 .../data/atom/error-with-external-identifier.xml   |   3 +-
 .../error-with-reference-and-create-origin.xml     |  16 ++
 swh/deposit/tests/data/atom/metadata.xml           |  10 +-
 22 files changed, 468 insertions(+), 70 deletions(-)
 create mode 100644 swh/deposit/tests/data/atom/entry-data-no-origin-url.xml
 create mode 100644 swh/deposit/tests/data/atom/entry-data-with-origin-reference.xml
 create mode 100644 swh/deposit/tests/data/atom/entry-only-create-origin.xml
 create mode 100644 swh/deposit/tests/data/atom/error-with-external-identifier-and-create-origin.xml
 create mode 100644 swh/deposit/tests/data/atom/error-with-reference-and-create-origin.xml
Changes applied before test
commit 3ad4d18d9b787f63c679f77f4dabcf814174e891
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Nov 25 19:14:26 2020 +0100

    Check origin urls start with the user's provider_url
    
    To prevent any user from creating any origin url.

commit 7bcfe1da9774ca986b0ca1cd0fd2c01444d48400
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Nov 25 19:06:33 2020 +0100

    Add tag <swh:create_origin>, to replace the Slug header.
    
    Computing origin urls based on provider url + slug is brittle.
    
    Additionally, it meant basing parenting relationship on reusing
    values of the slug across deposit, which is clearly not in the
    spirit of the SWORD specification.

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/465/ for more details.

Build is green

Patch application report for D4612 (id=16477)

Could not rebase; Attempt merge onto b7eac04c53...

Updating b7eac04c..b75ebd1a
Fast-forward
 docs/metadata.rst                                  |  30 +++-
 docs/specs/metadata_example.xml                    |   3 +
 docs/specs/spec-loading.rst                        |   3 +-
 docs/specs/swh.xsd                                 |   6 +
 docs/user-manual.rst                               |   8 +-
 swh/deposit/api/common.py                          |  56 +++++--
 swh/deposit/tests/api/test_collection.py           |  35 +++--
 swh/deposit/tests/api/test_collection_post_atom.py | 163 +++++++++++++++++----
 .../api/test_deposit_private_read_metadata.py      |   5 +-
 swh/deposit/tests/api/test_deposit_update.py       |  98 ++++++++++++-
 swh/deposit/tests/conftest.py                      |   5 +-
 swh/deposit/tests/data/atom/codemeta-sample.xml    |   6 +
 .../tests/data/atom/entry-data-no-origin-url.xml   |  26 ++++
 .../data/atom/entry-data-with-origin-reference.xml |  13 ++
 swh/deposit/tests/data/atom/entry-data0.xml        |   9 +-
 swh/deposit/tests/data/atom/entry-data2.xml        |   9 +-
 .../tests/data/atom/entry-only-create-origin.xml   |  10 ++
 swh/deposit/tests/data/atom/error-with-decimal.xml |  10 +-
 ...-with-external-identifier-and-create-origin.xml |  14 ++
 .../data/atom/error-with-external-identifier.xml   |   3 +-
 .../error-with-reference-and-create-origin.xml     |  16 ++
 swh/deposit/tests/data/atom/metadata.xml           |  10 +-
 22 files changed, 468 insertions(+), 70 deletions(-)
 create mode 100644 swh/deposit/tests/data/atom/entry-data-no-origin-url.xml
 create mode 100644 swh/deposit/tests/data/atom/entry-data-with-origin-reference.xml
 create mode 100644 swh/deposit/tests/data/atom/entry-only-create-origin.xml
 create mode 100644 swh/deposit/tests/data/atom/error-with-external-identifier-and-create-origin.xml
 create mode 100644 swh/deposit/tests/data/atom/error-with-reference-and-create-origin.xml
Changes applied before test
commit b75ebd1a74ae6b07bfa299c9aab74ef5d48258df
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Nov 25 19:14:26 2020 +0100

    Check origin urls start with the user's provider_url
    
    To prevent any user from creating any origin url.

commit 441cfe10a4b636ab3ed0311ceeb8e780a90a7e4f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Nov 25 19:06:33 2020 +0100

    Add tag <swh:create_origin>, to replace the Slug header.
    
    Computing origin urls based on provider url + slug is brittle.
    
    Additionally, it meant basing parenting relationship on reusing
    values of the slug across deposit, which is clearly not in the
    spirit of the SWORD specification.

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/467/ for more details.