Page MenuHomeSoftware Heritage

deposit_client: Allow deposit metadata update on completed deposit
ClosedPublic

Authored by ardumont on Oct 12 2020, 8:13 AM.

Details

Summary

This opens the --swhid to the swh upload cli.

If the update is ok, this returns the unchanged status summary of the deposit.
Otherwise, for example, update a non-complete deposit (status not "done"), this returns an explicit error.

Scenario around those cases are explicited in the diff.

Sample cli:

swh deposit upload \
  --url https://deposit.internal.staging.swh.network/ \
  --username swh \
  --password $(swhpass ls operations/deposit.softwareheritage.org/http-auth/swh | head -1) \
  --metadata "./deposit-update.xml" \ # [1]
  --deposit-id 605 \
  --swhid swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea

[1] deposit-update.xml

<?xml version="1.0" encoding="utf-8"?>
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:codemeta="https://doi.org/10.5063/SCHEMA/CODEMETA-2.0">
	<name>test-project</name>
	<author>ardumont</author>
	<codemeta:identifier>38857a56-bab9-46f6-9087-6f91f3e1be89</codemeta:identifier>
</entry>

This cli sample is the equivalent of P815 (which uses curl).

Related to T2538

Test Plan

tox

Using this code and the staging deposit server, it is happy with the following:

Oct 12 11:03:21 deposit python3[181685]: 2020-10-12 11:03:21 [181685] gunicorn.access:INFO 127.0.0.1 - swh [12/Oct/2020:11:03:21 +0000] "PUT /1/swh/605/metadata/ HTTP/1.1" 204 0 "-" "python-requests/2.20.1"

Diff Detail

Repository
rDDEP Push deposit
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 16150
Build 24846: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 24845: arc lint + arc unit

Event Timeline

Build has FAILED

Patch application report for D4228 (id=14917)

Could not rebase; Attempt merge onto b18cd89d4c...

Updating b18cd89d..d65ca90e
Fast-forward
 swh/deposit/cli/client.py                          |  38 +-
 swh/deposit/client.py                              | 189 ++++++--
 swh/deposit/tests/cli/test_client.py               | 528 +++++++++++++++------
 .../https_deposit.test.metadata/1_test_666_media   |   1 +
 .../https_deposit.test.status/1_servicedocument    |  26 +
 .../https_deposit.test.status/1_test_1033_status   |  10 +
 .../1_servicedocument                              |  26 +
 .../1_test_123_metadata                            |  10 +
 .../1_test_123_status                              |  10 +
 9 files changed, 624 insertions(+), 214 deletions(-)
 create mode 120000 swh/deposit/tests/data/https_deposit.test.metadata/1_test_666_media
 create mode 100644 swh/deposit/tests/data/https_deposit.test.status/1_servicedocument
 create mode 100644 swh/deposit/tests/data/https_deposit.test.status/1_test_1033_status
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_servicedocument
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_test_123_metadata
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_test_123_status
Changes applied before test
commit d65ca90e1523e966c92a95be09c6572db87bc61b
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Oct 12 08:11:30 2020 +0200

    deposit_client: Allow deposit metadata update on completed deposit
    
    Related to T2538

commit 0fe94348f60dcf2c748b95f072e502bf4d4eeab9
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sun Oct 11 12:13:04 2020 +0200

    test_client: Move redundant tests setup into fixtures

commit 2150833f440605c5c48b9fef0369f2ccd752f046
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Oct 10 08:13:58 2020 +0200

    test_client: Explicit the possible format outputs

commit 8e99386fa17d22b0ec1eb0fc4a806b233027deca
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Oct 9 16:23:17 2020 +0200

    deposit.client: Improve cli error messages and add missing coverage
    
    This adds the missing checks on:
    - no actionable command
    - missing --deposit-id when specifying the --replace flag
    - some more incompatible checks command scenario

commit 419c1b26d0452bc9965ed01c29ce6c1ab5c50831
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Oct 9 16:10:21 2020 +0200

    test_client: Add a step to update the partial deposit with archive
    
    This should actually cover all the missing cases now.

commit 1a782449064a4228af12ad768862611cc5a49a69
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Oct 9 13:48:49 2020 +0200

    test_client: Add scenario on deposit status cli
    
    This also checks the 3 different format outputs which were not so far.

commit 404f2728ad0f85789b0827025eafb1c8b6124c05
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Oct 9 12:17:58 2020 +0200

    test_client: Refactor and simplify tests for a better readability
    
    This:
    - Align all tests to use json format so the output check is simpler
    - Simplify the failure cases to check the caplog record
    - Drop dead code
    - Rename tests with a `test_cli_` prefix so it's simpler to run separately from
    the rest

Link to build: https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/207/
See console output for more information: https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/207/console

  • Fix type error
  • Add scenario on refused metadata update when the deposit status is wrong

Build is green

Patch application report for D4228 (id=14929)

Could not rebase; Attempt merge onto 419c1b26d0...

Updating 419c1b26..efca35a2
Fast-forward
 swh/deposit/cli/client.py                          |  38 +-
 swh/deposit/client.py                              | 214 +++++++----
 swh/deposit/tests/cli/test_client.py               | 401 ++++++++++++++++-----
 .../1_servicedocument                              |  26 ++
 .../1_test_123_metadata                            |  10 +
 .../1_test_123_status                              |  10 +
 .../1_test_321_status                              |   8 +
 7 files changed, 521 insertions(+), 186 deletions(-)
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_servicedocument
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_test_123_metadata
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_test_123_status
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_test_321_status
Changes applied before test
commit efca35a227bd749acd2c1b9a970cedf8b60b4348
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Oct 12 08:11:30 2020 +0200

    deposit_client: Allow deposit metadata update on completed deposit
    
    Related to T2538

commit 0fe94348f60dcf2c748b95f072e502bf4d4eeab9
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sun Oct 11 12:13:04 2020 +0200

    test_client: Move redundant tests setup into fixtures

commit 2150833f440605c5c48b9fef0369f2ccd752f046
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Oct 10 08:13:58 2020 +0200

    test_client: Explicit the possible format outputs

commit 8e99386fa17d22b0ec1eb0fc4a806b233027deca
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Oct 9 16:23:17 2020 +0200

    deposit.client: Improve cli error messages and add missing coverage
    
    This adds the missing checks on:
    - no actionable command
    - missing --deposit-id when specifying the --replace flag
    - some more incompatible checks command scenario

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/208/ for more details.

ardumont edited the summary of this revision. (Show Details)
ardumont edited the test plan for this revision. (Show Details)
  • Adapt
  • Update the diff description with a sample cli

Build is green

Patch application report for D4228 (id=14933)

Could not rebase; Attempt merge onto 419c1b26d0...

Updating 419c1b26..866ec641
Fast-forward
 swh/deposit/cli/client.py                          |  38 +-
 swh/deposit/client.py                              | 218 +++++++----
 swh/deposit/tests/cli/test_client.py               | 401 ++++++++++++++++-----
 .../1_servicedocument                              |  26 ++
 .../1_test_123_metadata                            |  10 +
 .../1_test_123_status                              |  10 +
 .../1_test_321_status                              |   8 +
 7 files changed, 525 insertions(+), 186 deletions(-)
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_servicedocument
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_test_123_metadata
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_test_123_status
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_test_321_status
Changes applied before test
commit 866ec64139b375adb93352414751cffd4d756ea2
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Oct 12 08:11:30 2020 +0200

    deposit_client: Allow deposit metadata update on completed deposit
    
    Related to T2538

commit 0fe94348f60dcf2c748b95f072e502bf4d4eeab9
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sun Oct 11 12:13:04 2020 +0200

    test_client: Move redundant tests setup into fixtures

commit 2150833f440605c5c48b9fef0369f2ccd752f046
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Oct 10 08:13:58 2020 +0200

    test_client: Explicit the possible format outputs

commit 8e99386fa17d22b0ec1eb0fc4a806b233027deca
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Oct 9 16:23:17 2020 +0200

    deposit.client: Improve cli error messages and add missing coverage
    
    This adds the missing checks on:
    - no actionable command
    - missing --deposit-id when specifying the --replace flag
    - some more incompatible checks command scenario

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/209/ for more details.

Rename header HTTP_X_CHECK_SWHID into X_CHECK_SWHID

Build is green

Patch application report for D4228 (id=14934)

Could not rebase; Attempt merge onto 419c1b26d0...

Updating 419c1b26..866ec641
Fast-forward
 swh/deposit/cli/client.py                          |  38 +-
 swh/deposit/client.py                              | 218 +++++++----
 swh/deposit/tests/cli/test_client.py               | 401 ++++++++++++++++-----
 .../1_servicedocument                              |  26 ++
 .../1_test_123_metadata                            |  10 +
 .../1_test_123_status                              |  10 +
 .../1_test_321_status                              |   8 +
 7 files changed, 525 insertions(+), 186 deletions(-)
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_servicedocument
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_test_123_metadata
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_test_123_status
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_test_321_status
Changes applied before test
commit 866ec64139b375adb93352414751cffd4d756ea2
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Oct 12 08:11:30 2020 +0200

    deposit_client: Allow deposit metadata update on completed deposit
    
    Related to T2538

commit 0fe94348f60dcf2c748b95f072e502bf4d4eeab9
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sun Oct 11 12:13:04 2020 +0200

    test_client: Move redundant tests setup into fixtures

commit 2150833f440605c5c48b9fef0369f2ccd752f046
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Oct 10 08:13:58 2020 +0200

    test_client: Explicit the possible format outputs

commit 8e99386fa17d22b0ec1eb0fc4a806b233027deca
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Oct 9 16:23:17 2020 +0200

    deposit.client: Improve cli error messages and add missing coverage
    
    This adds the missing checks on:
    - no actionable command
    - missing --deposit-id when specifying the --replace flag
    - some more incompatible checks command scenario

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/210/ for more details.

ardumont edited the test plan for this revision. (Show Details)

What are all these "compute_information" methods?

swh/deposit/client.py
24–33

sorry but that's not a great function name/docstring.

computes what information? based on what file? and used for what?

Note that I'm trying to take back control on existing codes that has never been
touched for a while... It's sadly not that clear and i'm merely trying to
actually open the code so the swhid scenario can be integrated into it.

Also using our now new standard of using types (and better tests with pytest
fixtures).

Here I had to touch those because i'm adding types to make sense of it all... [1]

I think when the coverage is decent enough (i think we are near), we'll be able
to rewrite it entirely with simpler code instead of that sad chain of
inheritance.

What are all these "compute_information" methods?

As explained earlier, the original compute_information implementation was in
the base class. As it's in the bad location and mostly overriden, I thought it
best to remove the base implementation and declare it appropriately where it
needs to.

Typically, each scenario will now have its own implementation correctly
defined when it needs to:

  • *ArchiveDepositClient will have its implementation providing the filepath argument with its kwargs[archive_path]
  • *MetadataDepositClient will do the same with its kwargs[metadata_path]
  • *MultipartDepositClient will do it with both kwargs[metadata_path] and kwargs[archive_path]

[1] It took me way too much time to open that --swhid flag because of that code...

swh/deposit/client.py
24–33

sorry but that's not a great function name/docstring.

i don't think self_compute_information, the old name was great either...
At least, unified in there tries to convey something more...

Note the existing code...

I moved it around because I had typing issues i tried to make sense of...
~> It got extracted because it was annoying when defined in the class.

computes what information?

Out of the input params of the method (which is not exactly the same, unfortunately), it tries to compute a unified information dict which will serve as param input for the subsequent methods called.

Those information are details in the Returns statement. The dict with keys method in the docstring?

based on what file?

either the archive_path or the metadata_path passed along by the client using this code.

and used for what?

already answered in my previous point.

462

those compute_information methods are coming from this implementation which got removed because:

  1. it's not clear at this position,
  2. it will be overridden most of the time.

So now, it's removed and other subclasses that needs it properly implement it.

Build has FAILED

Patch application report for D4228 (id=14957)

Could not rebase; Attempt merge onto 419c1b26d0...

Updating 419c1b26..4a6d8633
Fast-forward
 swh/deposit/cli/client.py                          |  38 +-
 swh/deposit/client.py                              | 218 +++++++----
 swh/deposit/tests/cli/test_client.py               | 409 ++++++++++++++++-----
 .../1_servicedocument                              |  26 ++
 .../1_test_123_metadata                            |  10 +
 .../1_test_123_status                              |  10 +
 .../1_test_321_status                              |   8 +
 7 files changed, 530 insertions(+), 189 deletions(-)
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_servicedocument
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_test_123_metadata
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_test_123_status
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_test_321_status
Changes applied before test
commit 4a6d8633a5f81b0501af944f4472f69b6dca939e
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Oct 12 08:11:30 2020 +0200

    deposit_client: Allow deposit metadata update on completed deposit
    
    Related to T2538

commit 2b164ac19bbfc6902dac19c17eea2dddb139d75c
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sun Oct 11 12:13:04 2020 +0200

    test_client: Move redundant tests setup into fixtures

commit 69e527dc239ad9f07d371d124af7b857aba138cc
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Oct 10 08:13:58 2020 +0200

    test_client: Explicit the possible format outputs

commit 8e99386fa17d22b0ec1eb0fc4a806b233027deca
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Oct 9 16:23:17 2020 +0200

    deposit.client: Improve cli error messages and add missing coverage
    
    This adds the missing checks on:
    - no actionable command
    - missing --deposit-id when specifying the --replace flag
    - some more incompatible checks command scenario

Link to build: https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/218/
See console output for more information: https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/218/console

Build is green

Patch application report for D4228 (id=14972)

Could not rebase; Attempt merge onto 0e0b342d9b...

Updating 0e0b342d..bf01939d
Fast-forward
 swh/deposit/cli/client.py                          |  11 ++
 swh/deposit/client.py                              | 218 ++++++++++++++-------
 swh/deposit/tests/cli/test_client.py               | 192 +++++++++++++-----
 .../1_servicedocument                              |  26 +++
 .../1_test_123_metadata                            |  10 +
 .../1_test_123_status                              |  10 +
 .../1_test_321_status                              |   8 +
 7 files changed, 345 insertions(+), 130 deletions(-)
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_servicedocument
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_test_123_metadata
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_test_123_status
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_test_321_status
Changes applied before test
commit bf01939d145e8b82eb7f4bc323d76387fdfb7949
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Oct 12 08:11:30 2020 +0200

    deposit_client: Allow deposit metadata update on completed deposit
    
    Related to T2538

commit 255d689569ffa5302cc90ad508eb966b5a8a7f8c
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sun Oct 11 12:13:04 2020 +0200

    test_client: Move redundant tests setup into fixtures

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/222/ for more details.

Yeah sorry, that wasn't a comment on your changes, but more on the existing state

This revision is now accepted and ready to land.Oct 13 2020, 9:36 AM

Yeah sorry

heh, don't worry ;)

I have mixed feelings on this code as well. I don't like it much but it does
the job. I tried to be dry and it became that ugly piece of entangled inherited
classes.

that wasn't a comment on your changes, but more on the existing state

yeah, gathered as much.

Hopefully, when i'm getting more confident on the coverage of the
swh.deposit.client module, the one with loaded inheritance chain, we will be
able to rewrite it entirely (and this without having to touch the tests).

In that coverage matter, btw, we are almost there. It's too bad that jenkins,
for some reason, in that diff, does not show it for that module (because it
passes there ¯\_(ツ)_/¯).