Page MenuHomeSoftware Heritage

cli: Warn when metadata-only deposit without metadata provenance
ClosedPublic

Authored by ardumont on Feb 23 2022, 10:24 AM.

Diff Detail

Repository
rDDEP Push deposit
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 27062
Build 42321: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 42320: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D7227 (id=26190)

Could not rebase; Attempt merge onto 55ae87b13c...

Updating 55ae87b1..61b81464
Fast-forward
 swh/deposit/api/private/deposit_list.py            | 23 ++++-
 swh/deposit/api/utils.py                           |  1 +
 swh/deposit/cli/client.py                          | 12 ++-
 swh/deposit/models.py                              |  8 ++
 swh/deposit/tests/api/test_deposit_private_list.py | 98 ++++++++++++++--------
 swh/deposit/tests/cli/test_client.py               | 27 +++++-
 .../data/atom/entry-data-with-swhid-no-prov.xml    | 13 +++
 .../tests/data/atom/entry-data-with-swhid.xml      |  4 +
 8 files changed, 144 insertions(+), 42 deletions(-)
 create mode 100644 swh/deposit/tests/data/atom/entry-data-with-swhid-no-prov.xml
Changes applied before test
commit 61b81464560b44ed3ed557ac5952329a15631cd1
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Feb 23 10:23:48 2022 +0100

    cli: Warn when metadata-only deposit without metadata provenance
    
    Related to T3677

commit acc02d1bc0dacad75e939d76def1e3e30a3b0aaf
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Feb 22 18:41:45 2022 +0100

    deposit-list: Allow deposit listing with their raw metadata if any
    
    This now lists the deposit with their associated raw metadata if any is present. This
    will allow adaptations in the moderation view [1] to display the metadata provenance
    url (provided it's parsed out of the raw metadata).
    
    [1] The moderation view consumes this internal api.
    
    Related to T3677

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/733/ for more details.

swh/deposit/tests/data/atom/entry-data-with-swhid.xml
8

Nice :-)

14

I would use an example where the domain of the client is the domain of this metadata-provenance url.
Also, there is the question of verifying the authority over the url.
@vlorentz : do you think this is needed like in create-origin?

Build is green

Patch application report for D7227 (id=26196)

Could not rebase; Attempt merge onto 94f9aa2228...

Updating 94f9aa22..07dc3fbe
Fast-forward
 swh/deposit/api/private/deposit_list.py            | 23 ++++-
 swh/deposit/api/utils.py                           |  1 +
 swh/deposit/cli/client.py                          | 12 ++-
 swh/deposit/models.py                              |  8 ++
 swh/deposit/tests/api/test_deposit_private_list.py | 98 ++++++++++++++--------
 swh/deposit/tests/cli/test_client.py               | 27 +++++-
 .../data/atom/entry-data-with-swhid-no-prov.xml    | 13 +++
 .../tests/data/atom/entry-data-with-swhid.xml      |  4 +
 8 files changed, 144 insertions(+), 42 deletions(-)
 create mode 100644 swh/deposit/tests/data/atom/entry-data-with-swhid-no-prov.xml
Changes applied before test
commit 07dc3fbe81a1b5567ef0ae95fb71f4f1782b8bcd
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Feb 23 10:23:48 2022 +0100

    cli: Warn when metadata-only deposit without metadata provenance
    
    Related to T3677

commit 5a7d59cb2cc7ddb5bae9ac30f8a31beb78c8dd5d
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Feb 22 18:41:45 2022 +0100

    deposit-list: Allow deposit listing with their raw metadata if any
    
    This now lists the deposit with their associated raw metadata if any is present. This
    will allow adaptations in the moderation view [1] to display the metadata provenance
    url (provided it's parsed out of the raw metadata).
    
    [1] The moderation view consumes this internal api.
    
    Related to T3677

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/737/ for more details.

swh/deposit/tests/data/atom/entry-data-with-swhid.xml
14

URL prefix rather than domain, but yes I agree

swh/deposit/tests/data/atom/entry-data-with-swhid.xml
14

i'm all ears about what that should look like (please ;)

swh/deposit/tests/cli/test_client.py
862

@moranegg, there is the full log message which mentions the metadata-only already (see the code).

Here i only check that the error message is mostly about metadata provenance. I'm not checking the full error log message (to avoid duplicating code in the test).

Adapt according to review: update url to something more sensible.

Build has FAILED

Patch application report for D7227 (id=26209)

Rebasing onto 5a7d59cb2c...

Current branch diff-target is up to date.
Changes applied before test
commit f9be97546ef95ec14ec2558b97853f9da67e8b31
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Feb 23 10:23:48 2022 +0100

    cli: Warn when metadata-only deposit without metadata provenance
    
    Related to T3677

Link to build: https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/741/
See console output for more information: https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/741/console

Build is green

Patch application report for D7227 (id=26219)

Could not rebase; Attempt merge onto 5a7d59cb2c...

Updating 5a7d59cb..97d61568
Fast-forward
 swh/deposit/api/common.py                          |  39 ++---
 swh/deposit/api/edit.py                            |   7 +-
 swh/deposit/api/private/deposit_read.py            |   2 -
 swh/deposit/cli/client.py                          |  14 +-
 swh/deposit/client.py                              |  78 +++++++---
 swh/deposit/config.py                              |   1 -
 swh/deposit/parsers.py                             |  10 +-
 swh/deposit/tests/api/conftest.py                  |   3 +-
 .../tests/api/test_collection_add_to_origin.py     |   7 +-
 swh/deposit/tests/api/test_collection_list.py      |  47 +++---
 swh/deposit/tests/api/test_collection_post_atom.py |  71 +++++----
 .../tests/api/test_collection_post_binary.py       |  52 +++++--
 .../tests/api/test_collection_post_multipart.py    |  39 +++--
 .../tests/api/test_collection_reuse_slug.py        |  31 ++--
 swh/deposit/tests/api/test_delete.py               |   8 +-
 .../tests/api/test_deposit_private_check.py        |   7 +-
 .../api/test_deposit_private_read_metadata.py      |   7 -
 swh/deposit/tests/api/test_deposit_schedule.py     |  20 ++-
 swh/deposit/tests/api/test_deposit_state.py        |  64 +++++---
 swh/deposit/tests/api/test_deposit_update_atom.py  |  64 ++++----
 .../tests/api/test_deposit_update_binary.py        |  19 +--
 swh/deposit/tests/api/test_get_file.py             |  22 +--
 swh/deposit/tests/api/test_parsers.py              |  97 ++++--------
 swh/deposit/tests/cli/test_client.py               | 171 +++++++++++++++------
 swh/deposit/tests/conftest.py                      |  14 +-
 .../data/atom/entry-data-with-swhid-no-prov.xml    |  13 ++
 .../tests/data/atom/entry-data-with-swhid.xml      |   4 +
 swh/deposit/tests/test_client_module.py            |   2 +-
 swh/deposit/utils.py                               |  14 --
 29 files changed, 537 insertions(+), 390 deletions(-)
 create mode 100644 swh/deposit/tests/data/atom/entry-data-with-swhid-no-prov.xml
Changes applied before test
commit 97d615683655d3c056edf54b61bcddbe0415e71d
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Feb 23 10:23:48 2022 +0100

    cli: Warn when metadata-only deposit without metadata provenance
    
    Related to T3677

commit 83a910b962d9580de746c3bc1e6ca5c0b8c5bae1
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Feb 23 19:00:09 2022 +0100

    Replace xmltodict with ElementTree to handle SWORD documents
    
    xmltodict was already on the way out for the deposit, and the latest
    libexpat security update broke it entirely when dealing with namespaces,
    which means we cannot use it until this is addressed.
    
    https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1006317
    
    Functional changes of this commit:
    
    1. No more writes to the 'metadata' jsonb column in the DB (as it
       strongly depends on xmltodict)
    2. ServiceDocumentDepositClient always outputs a list of collections,
       instead of None/dict/List[dict] depending on the number of
       collections (artefact of using xmltodict, which is replaced by proper
       parsing)

commit 949a6b4b2ee53403bdd46448c6eccb4f28105145
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Feb 23 13:44:07 2022 +0100

    Remove metadata_dict from the API
    
    No one uses that, and it's redundant, as we provide the original XML

commit 1182c1573fbf002f155b3f747f83555d0778ddf7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Feb 23 13:43:33 2022 +0100

    Remove unnecessary use of BytesIO

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/747/ for more details.

Build is green

Patch application report for D7227 (id=26227)

Rebasing onto cf92b5f2fe...

Current branch diff-target is up to date.
Changes applied before test
commit 163df3eefdac9b4b3d5ec33e3a11486fad02f682
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Feb 23 10:23:48 2022 +0100

    cli: Warn when metadata-only deposit without metadata provenance
    
    Related to T3677

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/751/ for more details.

swh/deposit/tests/data/atom/entry-data-with-swhid.xml
14

In another diff please.
(also it's not that simple cli side, we don't have the DepositClient to have the provider url for one...)

This revision is now accepted and ready to land.Feb 24 2022, 11:11 AM
ardumont added inline comments.
swh/deposit/tests/data/atom/entry-data-with-swhid.xml
14

That'd be D7238 which does the check server side (when doing the actual metadata-only deposit).