Page MenuHomeSoftware Heritage

Allow metadata-only deposit client side
ClosedPublic

Authored by ardumont on Dec 14 2020, 11:04 PM.

Details

Summary

This opens a swh deposit metadata-only cli [1]

This allows the users to post to their collection a metadata-only deposit.

The users provide the atom entry metadata file with --metadata flag. This file
can already contain the swhid to which we will attach the metadata. Or the user
can provide the swhid though the --swhid flag (internally, we will amend the
metadata file with the swhid consistently with what the server expect).

If no swhid is provided at all, the cli will raise and explain the problem.

[1]

 $ swh deposit metadata-only --help
Usage: swh deposit metadata-only [OPTIONS]

  Deposit metadata only upload



Options:
  --url TEXT                      (Optional) Deposit server api endpoint. By
                                  default,
                                  https://deposit.softwareheritage.org/1

  --username TEXT                 (Mandatory) User's name  [required]
  --password TEXT                 (Mandatory) User's associated password
                                  [required]

  --swhid TEXT                    SWHID
  --metadata PATH                 Path to xml metadata file  [required]
  -f, --format [logging|yaml|json]
                                  Output format results.
  -h, --help                      Show this message and exit.

Related to T2886

Test Plan

tox

Diff Detail

Repository
rDDEP Push deposit
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 17938
Build 27714: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 27713: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D4737 (id=16778)

Could not rebase; Attempt merge onto 73ee587df9...

Updating 73ee587d..a25f57c3
Fast-forward
 swh/deposit/api/common.py                          |   3 +-
 swh/deposit/cli/client.py                          |  69 +++++++++
 swh/deposit/client.py                              |  31 ++++
 swh/deposit/parsers.py                             | 100 -------------
 swh/deposit/tests/api/test_parsers.py              | 115 +--------------
 swh/deposit/tests/cli/test_client.py               | 163 +++++++++++++++++++++
 .../1_servicedocument                              |  26 ++++
 .../data/https_deposit.test.metadataonly/1_test    |  12 ++
 swh/deposit/tests/test_utils.py                    | 111 ++++++++++++++
 swh/deposit/utils.py                               | 110 +++++++++++++-
 10 files changed, 521 insertions(+), 219 deletions(-)
 create mode 100644 swh/deposit/tests/data/https_deposit.test.metadataonly/1_servicedocument
 create mode 100644 swh/deposit/tests/data/https_deposit.test.metadataonly/1_test
Changes applied before test
commit a25f57c3a6411816890c54b42a940a5b56e38f27
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Dec 14 22:05:17 2020 +0100

    Allow metadata-only deposit client side
    
    Related to T2886

commit 36f0b632966bf94ff2d4b9880e1ca89c6b14a68e
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Dec 14 17:43:36 2020 +0100

    Move parse_swh_reference to swh.deposit.utils namespace
    
    So it can be reused in other non django part of the deposit.
    
    Related to T2886

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/482/ for more details.

swh/deposit/cli/client.py
549

This will be dealt with in another diff [1]

[1] D4738

ardumont added a subscriber: vlorentz.
ardumont added inline comments.
swh/deposit/cli/client.py
495

I really wish we stop doing that redundancy parameter dance in each subcommand...
And push it within the scope of the deposit command.
And even open a --config-path (or --config-file) as other commands do...

Thing is, i don't know how to do that without breaking the current interface the users
know...

Is deprecating the old ways (--url --username --password) and start pushing for the new ones ok?
Any better idea?

@vlorentz ^

538

Is that reasonable at all?

(instead of mutating the user's file, i can also copy it in temporary location, i still don't know if the overall approach is sound though)

vlorentz added inline comments.
swh/deposit/cli/client.py
495

Yes. You canalso use a decorator, to deduplicate:

def authenticator_decorator(f):
    f = click.option("--username", required=True, help="(Mandatory) User's name")(f)
    f = click.option(
        "--password", required=True, help="(Mandatory) User's associated password"
    )(f)
    return f
swh/deposit/tests/cli/test_client.py
659

open(receipt_path, "r")

But I would rather not parse the config file in the test, that's just duplicating the client code; so keep the hardcoded expected_deposit_status.

677

why does a metadata-only deposit have a SWHID?

707

what is the point of doing that?

This revision now requires changes to proceed.Dec 15 2020, 10:23 AM
swh/deposit/cli/client.py
495

Thanks, that's a more dry step.

I still wish we stop duplicating those flags though.

swh/deposit/tests/cli/test_client.py
659

lol, thanks!

677

well, that's the input the user gave.
We still have to reference it?

Also that's what the server does.

707

It's a bit tedious and potentially error-prone to write a correct (with the right namespace and all [1]):

<swh:deposit>
  <swh:reference>
    <swh:object swhid="{swhid}" />
  </swh:reference>
</swh:deposit>

within your xml.

[1] which i forgot to add in this very diff...

swh/deposit/tests/cli/test_client.py
677

I don't think the server should return it.

It's supposed to be the SWHID of the deposit itself...

707

I don't see what makes it harder than having codemeta metadata, it's just one copy-paste away

swh/deposit/tests/cli/test_client.py
677

I don't think the server should return it.

The server needs fixing then... in another diff but i'm not sure about that, see next point.

It's supposed to be the SWHID of the deposit itself...

I'm half-convinced.
How do we compute that though, we don't have any way to do that, do we?

707

ok, so to conclude no --swhidthen, right?

swh/deposit/tests/cli/test_client.py
677

How do we compute that though, we don't have any way to do that, do we?

It's returned on "GET State-IRI" after the deposit is loaded.

ardumont edited the summary of this revision. (Show Details)

Adapt according to review:

  • drop --swhid flag (no need, let the user import correct metadata file)
  • do not return the deposit_swhid as output for the metadata-only deposit
  • clean up comments

Build is green

Patch application report for D4737 (id=16784)

Could not rebase; Attempt merge onto 73ee587df9...

Updating 73ee587d..d5dd18c0
Fast-forward
 swh/deposit/api/common.py                          |   3 +-
 swh/deposit/cli/client.py                          |  89 +++++++++++++---
 swh/deposit/client.py                              |  30 ++++++
 swh/deposit/parsers.py                             | 100 ------------------
 swh/deposit/tests/api/test_parsers.py              | 115 +--------------------
 swh/deposit/tests/cli/test_client.py               | 101 ++++++++++++++++++
 .../1_servicedocument                              |  26 +++++
 .../data/https_deposit.test.metadataonly/1_test    |  12 +++
 swh/deposit/tests/test_utils.py                    | 111 ++++++++++++++++++++
 swh/deposit/utils.py                               | 110 +++++++++++++++++++-
 10 files changed, 462 insertions(+), 235 deletions(-)
 create mode 100644 swh/deposit/tests/data/https_deposit.test.metadataonly/1_servicedocument
 create mode 100644 swh/deposit/tests/data/https_deposit.test.metadataonly/1_test
Changes applied before test
commit d5dd18c09ec8cac885c369a5a8d4812f734e1397
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Dec 14 22:05:17 2020 +0100

    Allow metadata-only deposit client side
    
    Related to T2886

commit 1bc55903126e3e1f847cbb29e2e4fd20d84401c1
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Dec 14 23:14:01 2020 +0100

    Trap and report exceptions in a unified way within the cli
    
    Related to T2886

commit 36f0b632966bf94ff2d4b9880e1ca89c6b14a68e
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Dec 14 17:43:36 2020 +0100

    Move parse_swh_reference to swh.deposit.utils namespace
    
    So it can be reused in other non django part of the deposit.
    
    Related to T2886

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/485/ for more details.

This revision is now accepted and ready to land.Dec 15 2020, 11:47 AM

Drop no longer parse_xml adaptation change

Build is green

Patch application report for D4737 (id=16788)

Rebasing onto 1bc5590312...

Current branch diff-target is up to date.
Changes applied before test
commit d1b17353e1ee534ba9d23c94750d07fb8542c21a
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Dec 14 22:05:17 2020 +0100

    Allow metadata-only deposit client side
    
    Related to T2886

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/486/ for more details.

This revision was automatically updated to reflect the committed changes.
swh/deposit/cli/client.py
495

As a first step D4748.