Page MenuHomeSoftware Heritage

identifiers: Make invalid persistent identifier parsing raise error
AbandonedPublic

Authored by ardumont on Jun 20 2018, 10:24 AM.

Details

Reviewers
anlambert
zack
Group Reviewers
Reviewers
Summary

So far we accept any kinds of persistent identifier, including invalid
ones. We should raise instead.

Related T1104

Before, for example:

$ python3
In [1]: from swh.model.identifiers import parse_persistent_identifier
In [2]: parse_persistent_identifier('foo')
Out[2]: {'namespace': 'foo', 'scheme_version': {}}

Now:

$ python3
Python 3.6.6rc1 (default, Jun 13 2018, 06:59:48)
[GCC 8.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from swh.model.identifiers import parse_persistent_identifier
>>> parse_persistent_identifier('foo')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-model/swh/model/identifiers.py", line 677, in parse_persistent_identifier
    'Wrong format: There should be 4 mandatory parameters')
swh.model.identifiers.SWHMalformedIdentifierException: Wrong format: There should be 4 mandatory parameters
Test Plan

Tests should still pass

Diff Detail

Repository
rDMOD Data model
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 1260
Build 1604: arc lint + arc unit

Event Timeline

swh/model/tests/test_identifiers.py
865–866

Do we also make those raise? Think so.

This optional part is not checked and the actual code silently skips failure.

swh/model/tests/test_identifiers.py
865–866

Yes I think so. The optional parts must be in the form 'key=value'. I handled malformed cases but did not raise any exception in the associated code.

  • identifiers: Raise when invalid contextual data in persistent id
ardumont added inline comments.
swh/model/tests/test_identifiers.py
865–866

sold.

As per discussion with anlambert, converge towards the existing swh-web utils code [1]

[1] https://forge.softwareheritage.org/source/swh-web/browse/master/swh/web/common/utils.py$254

  • identifiers: Also validate the hash is correct
  • identifiers: Fix wrong type in docstring
  • identifiers: Permit to pass directly the object's id to persistent_identifier api

As per discussion with anlambert, converge towards the existing swh-web utils code [1]

Clarifying, the webapp's utils code is a 'duplicate' that will possibly become an adapter of the api here.
Probably delegating the parsing to the swh.model function, and possibly transforming any exceptions raised during parsing into the needed swh.web ones.
So that, everything continues working as expected in the swh.web module.

swh/model/identifiers.py
646

@anlambert Following our irc discussion, I found a middle ground. We can now pass directly the identifier instead of a dict.
That will help for the client using this api (as they might not be aware of what's the right key for the identifier).
Still, i kept the dict resolution as our internal representation uses it, for example.

  • swh.model.cli: Catch specific exception during identifiers check
swh/model/cli.py
22

And todo fixed!

swh/model/identifiers.py
646

That's great! It is really more convenient to work directly with ids in swh-web instead of creating a dict with only one entry. Thanks!

This revision is now accepted and ready to land.Jun 21 2018, 11:20 AM
zack requested changes to this revision.Jun 21 2018, 11:29 AM
zack added inline comments.
swh/model/identifiers.py
666

naming, the hardest thing ever in programming!

can we please call this exception just "InvalidIdentifier" ?

rationale:

  • "invalid" is a more clear and direct term than "malformed"
  • "SWH" is redundant, it's our code and it's clear from the module path
  • "Exception" is redundant, the class derives from that class anyway
This revision now requires changes to proceed.Jun 21 2018, 11:29 AM
  • identifiers: Validate that inputs are correct
  • identifiers: Reuse ValidationError exception + update docstring
swh/model/identifiers.py
666

Did not see this prior to update the diff.

naming, the hardest thing ever in programming!

could not agree more!

rationale:

Love your rationale, thx.

can we please call this exception just "InvalidIdentifier" ?

I just removed this class altogether and use directly the main ValidationError class (used in model).

My rationale was that we are checking for validation of input, be that type, identifier, persistent identifier.
When failing those, we raise a ValidationError with the error detailed in the message.
Seems reasonable enough since we fail a validation.

swh/model/identifiers.py
666

I just removed this class altogether and use directly the main ValidationError class (used in model).

even better :-)

All concerns have been addressed.
This is already merged and improved through:

bd22f277c309
f2422d65c2b4
dfb128e9210b
b6073e27611e
79d57ac7b788
7779c573e9c7
0d3a05141b81
8a0cc22a73f9