Page MenuHomeSoftware Heritage

deposit.api.checks: Update api checks to allow more alternate fields
AbandonedPublic

Authored by ardumont on Mon, Oct 12, 2:05 PM.

Details

Summary

at least:

  • author or codemeta:author field must be present
  • title, name, codemeta:title, or codemeta:name must be present

This actually allows to use the generated metadata from the deposit client cli.
The actual metadata generated by the cli uses codemeta:author and
codemeta:name, which would fail the check in the metadata update scenario.

Test Plan

tox

Diff Detail

Repository
rDDEP Push deposit
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 16213
Build 24946: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 24945: arc lint + arc unit

Event Timeline

ardumont created this revision.Mon, Oct 12, 2:05 PM

Build has FAILED

Patch application report for D4232 (id=14937)

Could not rebase; Attempt merge onto 419c1b26d0...

Updating 419c1b26..46c88acc
Fast-forward
 swh/deposit/api/checks.py                          |  33 +-
 swh/deposit/cli/client.py                          |  38 +-
 swh/deposit/client.py                              | 218 +++++++----
 swh/deposit/tests/api/test_checks.py               | 166 ++++++---
 swh/deposit/tests/cli/test_client.py               | 401 ++++++++++++++++-----
 .../1_servicedocument                              |  26 ++
 .../1_test_123_metadata                            |  10 +
 .../1_test_123_status                              |  10 +
 .../1_test_321_status                              |   8 +
 9 files changed, 651 insertions(+), 259 deletions(-)
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_servicedocument
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_test_123_metadata
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_test_123_status
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_test_321_status
Changes applied before test
commit 46c88acc359a7789dd6885404dda8b66946dc5a1
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Oct 12 13:59:09 2020 +0200

    deposit.api.checks: Update api checks to allow more alternate fields
    
    at least:
    - author or codemeta:author field must be present
    - title, name, codemeta:title, or codemeta:name must be present
    
    This actually allows to use the generated metadata from the deposit client cli.
    The actual metadata generated by the cli uses codemeta:author and
    codemeta:name, which would fail the check this in the metadata update scenario.

commit 8861413d52e2e7719b5a70b0bbc9f7876a68891b
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Oct 12 13:49:55 2020 +0200

    test_checks: Use pytest.mark.parametrize
    
    This will allow to improve the existing code and add some more sample without
    the need to craft new tests.

commit 866ec64139b375adb93352414751cffd4d756ea2
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Oct 12 08:11:30 2020 +0200

    deposit_client: Allow deposit metadata update on completed deposit
    
    Related to T2538

commit 0fe94348f60dcf2c748b95f072e502bf4d4eeab9
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sun Oct 11 12:13:04 2020 +0200

    test_client: Move redundant tests setup into fixtures

commit 2150833f440605c5c48b9fef0369f2ccd752f046
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Oct 10 08:13:58 2020 +0200

    test_client: Explicit the possible format outputs

commit 8e99386fa17d22b0ec1eb0fc4a806b233027deca
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Oct 9 16:23:17 2020 +0200

    deposit.client: Improve cli error messages and add missing coverage
    
    This adds the missing checks on:
    - no actionable command
    - missing --deposit-id when specifying the --replace flag
    - some more incompatible checks command scenario

Link to build: https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/212/
See console output for more information: https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/212/console

ardumont updated this revision to Diff 14939.Mon, Oct 12, 3:01 PM
  • Fix remaining test
  • Update docstrings
  • Drop no longer needed code

Build has FAILED

Patch application report for D4232 (id=14939)

Could not rebase; Attempt merge onto 419c1b26d0...

Updating 419c1b26..53495f2d
Fast-forward
 swh/deposit/api/checks.py                          |  40 +-
 swh/deposit/cli/client.py                          |  38 +-
 swh/deposit/client.py                              | 218 +++++++----
 swh/deposit/tests/api/test_checks.py               | 166 ++++++---
 .../tests/api/test_deposit_private_check.py        |  18 +-
 swh/deposit/tests/cli/test_client.py               | 401 ++++++++++++++++-----
 .../1_servicedocument                              |  26 ++
 .../1_test_123_metadata                            |  10 +
 .../1_test_123_status                              |  10 +
 .../1_test_321_status                              |   8 +
 10 files changed, 664 insertions(+), 271 deletions(-)
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_servicedocument
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_test_123_metadata
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_test_123_status
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_test_321_status
Changes applied before test
commit 53495f2d49df2218f5641a9b169f789e2c81efff
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Oct 12 13:59:09 2020 +0200

    deposit.api.checks: Update api checks to allow more alternate fields
    
    at least, this makes the following fields able to be filled with either:
    - author or codemeta:author
    - title, name, codemeta:title, or codemeta:name
    
    This, for example, actually allows to use the generated metadata from the
    deposit client cli.
    
    The actual metadata generated by the cli uses `codemeta:author` and
    `codemeta:name`. Prior to this commit, those metadata would fail the previous
    check in the case of the metadata update scenario.

commit 8861413d52e2e7719b5a70b0bbc9f7876a68891b
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Oct 12 13:49:55 2020 +0200

    test_checks: Use pytest.mark.parametrize
    
    This will allow to improve the existing code and add some more sample without
    the need to craft new tests.

commit 866ec64139b375adb93352414751cffd4d756ea2
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Oct 12 08:11:30 2020 +0200

    deposit_client: Allow deposit metadata update on completed deposit
    
    Related to T2538

commit 0fe94348f60dcf2c748b95f072e502bf4d4eeab9
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sun Oct 11 12:13:04 2020 +0200

    test_client: Move redundant tests setup into fixtures

commit 2150833f440605c5c48b9fef0369f2ccd752f046
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Oct 10 08:13:58 2020 +0200

    test_client: Explicit the possible format outputs

commit 8e99386fa17d22b0ec1eb0fc4a806b233027deca
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Oct 9 16:23:17 2020 +0200

    deposit.client: Improve cli error messages and add missing coverage
    
    This adds the missing checks on:
    - no actionable command
    - missing --deposit-id when specifying the --replace flag
    - some more incompatible checks command scenario

Link to build: https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/213/
See console output for more information: https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/213/console

ardumont updated this revision to Diff 14942.Mon, Oct 12, 3:12 PM

Fix last test i missed

Build is green

Patch application report for D4232 (id=14942)

Could not rebase; Attempt merge onto 419c1b26d0...

Updating 419c1b26..b913dad0
Fast-forward
 swh/deposit/api/checks.py                          |  40 +-
 swh/deposit/cli/client.py                          |  38 +-
 swh/deposit/client.py                              | 218 +++++++----
 swh/deposit/tests/api/test_checks.py               | 166 ++++++---
 .../tests/api/test_deposit_private_check.py        |  18 +-
 swh/deposit/tests/api/test_deposit_update.py       |   4 +-
 swh/deposit/tests/cli/test_client.py               | 401 ++++++++++++++++-----
 .../1_servicedocument                              |  26 ++
 .../1_test_123_metadata                            |  10 +
 .../1_test_123_status                              |  10 +
 .../1_test_321_status                              |   8 +
 11 files changed, 666 insertions(+), 273 deletions(-)
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_servicedocument
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_test_123_metadata
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_test_123_status
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_test_321_status
Changes applied before test
commit b913dad0c244f135f7840ae4fe8d156d85ed455c
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Oct 12 13:59:09 2020 +0200

    deposit.api.checks: Update api checks to allow more alternate fields
    
    at least, this makes the following fields able to be filled with either:
    - author or codemeta:author
    - title, name, codemeta:title, or codemeta:name
    
    This, for example, actually allows to use the generated metadata from the
    deposit client cli.
    
    The actual metadata generated by the cli uses `codemeta:author` and
    `codemeta:name`. Prior to this commit, those metadata would fail the previous
    check in the case of the metadata update scenario.

commit 8861413d52e2e7719b5a70b0bbc9f7876a68891b
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Oct 12 13:49:55 2020 +0200

    test_checks: Use pytest.mark.parametrize
    
    This will allow to improve the existing code and add some more sample without
    the need to craft new tests.

commit 866ec64139b375adb93352414751cffd4d756ea2
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Oct 12 08:11:30 2020 +0200

    deposit_client: Allow deposit metadata update on completed deposit
    
    Related to T2538

commit 0fe94348f60dcf2c748b95f072e502bf4d4eeab9
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sun Oct 11 12:13:04 2020 +0200

    test_client: Move redundant tests setup into fixtures

commit 2150833f440605c5c48b9fef0369f2ccd752f046
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Oct 10 08:13:58 2020 +0200

    test_client: Explicit the possible format outputs

commit 8e99386fa17d22b0ec1eb0fc4a806b233027deca
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Oct 9 16:23:17 2020 +0200

    deposit.client: Improve cli error messages and add missing coverage
    
    This adds the missing checks on:
    - no actionable command
    - missing --deposit-id when specifying the --replace flag
    - some more incompatible checks command scenario

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/214/ for more details.

ardumont updated this revision to Diff 14943.Mon, Oct 12, 3:16 PM

Simplify code again

Build is green

Patch application report for D4232 (id=14943)

Could not rebase; Attempt merge onto 419c1b26d0...

Updating 419c1b26..9e1b2c98
Fast-forward
 swh/deposit/api/checks.py                          |  41 +--
 swh/deposit/cli/client.py                          |  38 +-
 swh/deposit/client.py                              | 218 +++++++----
 swh/deposit/tests/api/test_checks.py               | 166 ++++++---
 .../tests/api/test_deposit_private_check.py        |  18 +-
 swh/deposit/tests/api/test_deposit_update.py       |   4 +-
 swh/deposit/tests/cli/test_client.py               | 401 ++++++++++++++++-----
 .../1_servicedocument                              |  26 ++
 .../1_test_123_metadata                            |  10 +
 .../1_test_123_status                              |  10 +
 .../1_test_321_status                              |   8 +
 11 files changed, 664 insertions(+), 276 deletions(-)
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_servicedocument
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_test_123_metadata
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_test_123_status
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_test_321_status
Changes applied before test
commit 9e1b2c9817b79c43ad6efe661c361765b4a51ac7
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Oct 12 13:59:09 2020 +0200

    deposit.api.checks: Update api checks to allow more alternate fields
    
    at least, this makes the following fields able to be filled with either:
    - author or codemeta:author
    - title, name, codemeta:title, or codemeta:name
    
    This, for example, actually allows to use the generated metadata from the
    deposit client cli.
    
    The actual metadata generated by the cli uses `codemeta:author` and
    `codemeta:name`. Prior to this commit, those metadata would fail the previous
    check in the case of the metadata update scenario.

commit 8861413d52e2e7719b5a70b0bbc9f7876a68891b
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Oct 12 13:49:55 2020 +0200

    test_checks: Use pytest.mark.parametrize
    
    This will allow to improve the existing code and add some more sample without
    the need to craft new tests.

commit 866ec64139b375adb93352414751cffd4d756ea2
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Oct 12 08:11:30 2020 +0200

    deposit_client: Allow deposit metadata update on completed deposit
    
    Related to T2538

commit 0fe94348f60dcf2c748b95f072e502bf4d4eeab9
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sun Oct 11 12:13:04 2020 +0200

    test_client: Move redundant tests setup into fixtures

commit 2150833f440605c5c48b9fef0369f2ccd752f046
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Oct 10 08:13:58 2020 +0200

    test_client: Explicit the possible format outputs

commit 8e99386fa17d22b0ec1eb0fc4a806b233027deca
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Oct 9 16:23:17 2020 +0200

    deposit.client: Improve cli error messages and add missing coverage
    
    This adds the missing checks on:
    - no actionable command
    - missing --deposit-id when specifying the --replace flag
    - some more incompatible checks command scenario

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/215/ for more details.

ardumont added inline comments.
swh/deposit/api/checks.py
31–33

@moranegg see, "everything comes to *them* who wait" ;)

I don't remember when you said to me, "it'd be nice to rename those" or some such

ardumont edited the summary of this revision. (Show Details)Mon, Oct 12, 3:22 PM

That's a breaking change, right? If so, we do we allow all these alternatives instead of making just one of each mandatory?

ardumont added a comment.EditedMon, Oct 12, 5:10 PM

That's a breaking change, right?

I don't think it's a breaking change. If anything, it will allow more metadata
to pass providing they are passing along codemeta values now.

before, failure if one of the following is not respected:

  • mandatory: author
  • one of the following is mandatory: name or title

now, failure if one of the following is not respected:

  • one of the following is mandatory: author or codemeta:author
  • one of the following is mandatory: name, codemeta:name, title or codemeta:title

Build is green

Patch application report for D4232 (id=14973)

Could not rebase; Attempt merge onto 0e0b342d9b...

Updating 0e0b342d..95e3b765
Fast-forward
 swh/deposit/api/checks.py                          |  41 ++--
 swh/deposit/cli/client.py                          |  11 ++
 swh/deposit/client.py                              | 218 ++++++++++++++-------
 swh/deposit/tests/api/test_checks.py               |  74 ++++++-
 .../tests/api/test_deposit_private_check.py        |  18 +-
 swh/deposit/tests/api/test_deposit_update.py       |   4 +-
 swh/deposit/tests/cli/test_client.py               | 192 +++++++++++++-----
 .../1_servicedocument                              |  26 +++
 .../1_test_123_metadata                            |  10 +
 .../1_test_123_status                              |  10 +
 .../1_test_321_status                              |   8 +
 11 files changed, 444 insertions(+), 168 deletions(-)
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_servicedocument
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_test_123_metadata
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_test_123_status
 create mode 100644 swh/deposit/tests/data/https_deposit.test.updateswhid/1_test_321_status
Changes applied before test
commit 95e3b765311b146b4f15d717bcfb6b1c98838be1
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Oct 12 13:59:09 2020 +0200

    deposit.api.checks: Update api checks to allow more alternate fields
    
    at least, this makes the following fields able to be filled with either:
    - author or codemeta:author
    - title, name, codemeta:title, or codemeta:name
    
    This, for example, actually allows to use the generated metadata from the
    deposit client cli.
    
    The actual metadata generated by the cli uses `codemeta:author` and
    `codemeta:name`. Prior to this commit, those metadata would fail the previous
    check in the case of the metadata update scenario.

commit bf01939d145e8b82eb7f4bc323d76387fdfb7949
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Oct 12 08:11:30 2020 +0200

    deposit_client: Allow deposit metadata update on completed deposit
    
    Related to T2538

commit 255d689569ffa5302cc90ad508eb966b5a8a7f8c
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sun Oct 11 12:13:04 2020 +0200

    test_client: Move redundant tests setup into fixtures

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/223/ for more details.

before, failure if one of the following is not respected:

  • mandatory: author
  • one of the following is mandatory: name or title

now, failure if one of the following is not respected:

  • one of the following is mandatory: author or codemeta:author
  • one of the following is mandatory: name, codemeta:name, title or codemeta:title

Why do we want that? It increases complexity that readers of the metadata (deposit, indexers, possibly webapp, ...) have to deal with

Shouldn't we update the deposit client to use the right fields instead?

ardumont added a comment.EditedTue, Oct 13, 10:42 AM

Why do we want that? It increases complexity that readers of the metadata (deposit, indexers, possibly webapp, ...) have to deal with

Well, for one, i found it strange to use unqualified fields. But that's old
source code, prior to us starting using more and more codemeta ¯\_(ツ)_/¯.

I don't really see the reading complexity... We don't really have any control
over what's sent by clients today so you may have already those
codemeta:author, codemeta:title and other fields entries today (and apparently
those readers are not complaining ;).

Those potential fields are just not part of the current checks. Which this diff
just tries to allow. This to be more consistent (we do push clients to use
codemeta...) and also be consistent with what the deposit client already
generates.

Shouldn't we update the deposit client to use the right fields instead?

yes, that could be another possible implementation (with apparently less
impact, well less friction at least :).

Note that D1419 introduced those generated fields as codemeta:name and
codemeta:author. But the referenced task T1650 mentioned plain
name and author... So that implementation would realign appropriately the
code and that task (if that task still makes sense today that is).

@moranegg any thoughts on this diff (and discussion) ^

Cheers,

I agree with @ardumont that it's not a breaking change, it is something I asked for.
I agree with @vlorentz that having more options makes it more complicated to explain in the docs.

The bottom line here is that we want to continue to check that a software deposits have at least one author and a title.
Giving the client different ways to provide that (with AtomPub or with CodeMeta)

Let me review our documentation, to accept this diff.
@vlorentz maybe you have a different idea here on how metadata should be checked?

swh/deposit/api/checks.py
31–33

:-)

ardumont edited the test plan for this revision. (Show Details)Tue, Oct 13, 10:51 AM
ardumont updated this revision to Diff 14988.Tue, Oct 13, 11:37 AM
ardumont edited the test plan for this revision. (Show Details)

Rebase

Build is green

Patch application report for D4232 (id=14988)

Rebasing onto a8e6b830bb...

Current branch diff-target is up to date.
Changes applied before test
commit e5522766157421cd2b83b075c5ae349c701ee19e
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Oct 12 13:59:09 2020 +0200

    deposit.api.checks: Update api checks to allow more alternate fields
    
    at least, this makes the following fields able to be filled with either:
    - author or codemeta:author
    - title, name, codemeta:title, or codemeta:name
    
    This, for example, actually allows to use the generated metadata from the
    deposit client cli.
    
    The actual metadata generated by the cli uses `codemeta:author` and
    `codemeta:name`. Prior to this commit, those metadata would fail the previous
    check in the case of the metadata update scenario.

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/224/ for more details.

Now that I have taken the time to see what we did and how this diff is changing things, I can see it has the potential of breaking the deposit.
The question is, how this diff serves the cli (feels that its goal is for the cli) ?

I would feel more comfortable not changing the algorithm checking metadata at this moment, if this is not necessary for the cli.

swh/deposit/api/checks.py
16

why did this distinction change?
because all fields are now alternate?

31–33

codemeta:title doesn't exists
it was title for AtomPub and name for CodeMeta.

I'm a bit on the fence here, because we used these properties in a more generic way without the namespace even thought the algorithm identified name in codemeta:name.

Also, the overall idea in the future will be not to keep codemeta in xml format.

vlorentz added a comment.EditedTue, Oct 13, 5:09 PM

Why do we want that? It increases complexity that readers of the metadata (deposit, indexers, possibly webapp, ...) have to deal with

Well, for one, i found it strange to use unqualified fields. But that's old
source code, prior to us starting using more and more codemeta ¯\_(ツ)_/¯.

They are not unqualified, they are in the Atom XMLNS in the XML document the client sends us.

I don't really see the reading complexity... We don't really have any control
over what's sent by clients today so you may have already those
codemeta:author, codemeta:title and other fields entries today (and apparently
those readers are not complaining ;).

Yes, but just because we have {https://doi.org/10.5063/schema/codemeta-2.0}author doesn't mean we can't also have {http://www.w3.org/2005/Atom}author.

And btw, it's incorrect to name them codemeta:author or author in the error messages, because the client may send us an XML document like this:

<?xml version="1.0"?>
<at:entry xmlns:at="http://www.w3.org/2005/Atom"
          xmlns:cm="https://doi.org/10.5063/SCHEMA/CODEMETA-2.0">
  <at:title>Foo</at:title>
  <cm:version>bar</cm:version>
</at:entry>

which is semantically indistinguishable from:

<?xml version="1.0"?>
<entry xmlns="http://www.w3.org/2005/Atom"
       xmlns:codemeta="https://doi.org/10.5063/SCHEMA/CODEMETA-2.0">
  <title>Foo</title>
  <codemeta:version>bar</codemeta:version>
</entry>

so we shouldn't make assumptions about how they name their namespaces.

Yes, but just because we have
{https://doi.org/10.5063/schema/codemeta-2.0}author doesn't mean we can't
also have {http://www.w3.org/2005/Atom}author.

ok

And btw, it's incorrect to name them codemeta:author or author in the error
messages, because the client may send us an XML document like this:
<?xml version="1.0"?>
<at:entry xmlns:at="http://www.w3.org/2005/Atom"

        xmlns:cm="https://doi.org/10.5063/SCHEMA/CODEMETA-2.0">
<at:title>Foo</at:title>
<cm:version>bar</cm:version>

</at:entry>

which is semantically indistinguishable from:

<?xml version="1.0"?>
<entry xmlns="http://www.w3.org/2005/Atom"

     xmlns:codemeta="https://doi.org/10.5063/SCHEMA/CODEMETA-2.0">
<title>Foo</title>
<codemeta:version>bar</codemeta:version>

</entry>

ok

But if they do the first, won't it get rejected?

ah no. Well, yes it will get rejected because it's missing the author field.
But yeah, ok, it will be parsed correctly and is indistinguishable as you say:

$ DJANGO_SETTINGS_MODULE="swh.deposit.settings.testing" ipython
...
In [7]: data1 = """<?xml version="1.0"?>
   ...: <entry xmlns="http://www.w3.org/2005/Atom"
   ...:        xmlns:codemeta="https://doi.org/10.5063/SCHEMA/CODEMETA-2.0">
   ...:   <title>Foo</title>
   ...:   <codemeta:version>bar</codemeta:version>
   ...: </entry>
   ...: """
   ...:
   ...: data2 = """<?xml version="1.0"?>
   ...: <at:entry xmlns:at="http://www.w3.org/2005/Atom"
   ...:           xmlns:cm="https://doi.org/10.5063/SCHEMA/CODEMETA-2.0">
   ...:   <at:title>Foo</at:title>
   ...:   <cm:version>bar</cm:version>
   ...: </at:entry>
   ...: """
   ...:
   ...: from swh.deposit.parsers import parse_xml
   ...: data_dict = dict(parse_xml(data))
   ...: data_dict2 = dict(parse_xml(data2))
   ...:
   ...: data_dict
   ...: data_dict == data_dict2
   ...:
Out[7]: True

In [8]: data_dict
Out[8]: {'title': 'Foo', 'codemeta:version': 'bar'}

so we shouldn't make assumptions about how they name their namespaces.

You are not wrong there. The error I keep on making.

Ok plan forward is dropping this diff and adapting the deposit client to stop
generating wrong metadata then, correct?

swh/deposit/api/checks.py
16

they were all mandatory field, including the ones below.
But that field author did not have any alternative to look for, so it was dealt with differently (with a dedicated message).

31–33

oh ok, so so codemeta:title can be removed from the alternative fields since it makes no sense.

ardumont abandoned this revision.Tue, Oct 13, 5:34 PM

Ok plan forward is dropping this diff and adapting the deposit client to stop
generating wrong metadata then, correct?

Opened T2701 to decide what to do.