parse_persistent_identifier() should raise a parsing exception on invalid identifiers
Closed, MigratedEdits Locked
Actions

Assigned To

Authored By

	zack
	Jun 16 2018, 4:19 PM

Description

In [1]: from swh.model.identifiers import parse_persistent_identifier
In [2]: parse_persistent_identifier('foo')
Out[2]: {'namespace': 'foo', 'scheme_version': {}}

Revisions and Commits

rDMOD Data model
	rDMODbd22f277c309 identifiers: Reuse ValidationError exception + update docstring

Related Objects

Mentioned In: R65:0e9c50fd9d9e: swh.web.common.utils: Adapt persistent identifier computations
rDWAPPS0e9c50fd9d9e: swh.web.common.utils: Adapt persistent identifier computations
rDMODf2422d65c2b4: identifiers: Validate that inputs are correct
rDMODdfb128e9210b: swh.model.cli: Catch specific exception during identifiers check
rDMODb6073e27611e: identifiers: Permit to pass directly the object's id
rDMOD0d3a05141b81: identifiers: Raise when invalid contextual data in persistent id
rDMOD8a0cc22a73f9: identifiers: Make invalid persistent identifier parsing raise error
rDMOD7779c573e9c7: identifiers: Also validate the hash is correct
D348: swh.web.common.utils: Adapt persistent identifier computations
T1112: swh.model.identifier: Improve persistent identifier representation
D346: identifiers: Make invalid persistent identifier parsing raise error
Mentioned Here: T1112: swh.model.identifier: Improve persistent identifier representation

Event Timeline

@ardumont I'm tentatively assigning this to you as I think the code is yours, but feel free to reassign as needed!

ardumont mentioned this in D346: identifiers: Make invalid persistent identifier parsing raise error.Jun 20 2018, 10:24 AM

@ardumont I'm tentatively assigning this to you as I think the code is yours, but feel free to reassign as needed!

Now, now, for the me, the code is the team's code. So it's "mine" because i'm part of the team.
But, indeed, I initially authored it.

Regarding that code, i simply added the checks to make parsing errors raise (in accord with the current task).

I recall some remarks about the persistent identifier representation being too simple or something.

I don't know what's wrong with that simple representation as:

the actual representation is also simple enough
we don't have type anyway
everyone can manipulate dict

Thanks for the enlightenment.

Note: I'm adding @anlambert in the loop as he is the one manipulating those the most.
He also helped improve the current implementation. So he might be interested by the discussion.

In T1104#20616, @ardumont wrote:

I recall some remarks about the persistent identifier representation being too simple or something.

I don't know what's wrong with that simple representation as:

everyone can manipulate dict

(a bit orthogonal to this specific task, but while we're at it)

here's what's wrong with dicts, in general:

you cannot type (in the sense of type checking) them more precisely than "it's a dict", and if you end up having dicts everywhere (which is our case) that's not very useful
they're prone to typo-based errors in their keys, both when accessing and when updating them
they're mutable, so when you pass them around you just have to pray client code will not modify them
accessing them requires hashing the key, which has a cost (often negligible, but still)

They're useful in a bunch of situations, like when the set of keys you use it's not close-ended (e.g., all the "metadata" situations we have in the archive).
But when the list of fields is fixed, which is the case for persistent identifiers, named tuples could be much better, and they address all the above issues.

ardumont mentioned this in T1112: swh.model.identifier: Improve persistent identifier representation.Jun 20 2018, 12:32 PM

Ok, thanks!

I suppose that by named tuples, you are talking about those [1] which i need to read further.
In the mean time, i created the T1112 about improving such representation.

[1] https://docs.python.org/3/library/collections.html#collections.namedtuple

ardumont mentioned this in D348: swh.web.common.utils: Adapt persistent identifier computations.Jun 21 2018, 1:29 PM

ardumont closed this task as Resolved by committing rDMODbd22f277c309: identifiers: Reuse ValidationError exception + update docstring.Jun 21 2018, 1:45 PM

ardumont mentioned this in rDMOD8a0cc22a73f9: identifiers: Make invalid persistent identifier parsing raise error.

ardumont mentioned this in rDMOD0d3a05141b81: identifiers: Raise when invalid contextual data in persistent id.

ardumont mentioned this in rDMOD7779c573e9c7: identifiers: Also validate the hash is correct.

ardumont mentioned this in rDMODb6073e27611e: identifiers: Permit to pass directly the object's id.

ardumont mentioned this in rDMODdfb128e9210b: swh.model.cli: Catch specific exception during identifiers check.

ardumont mentioned this in rDMODf2422d65c2b4: identifiers: Validate that inputs are correct.

ardumont added a commit: rDMODbd22f277c309: identifiers: Reuse ValidationError exception + update docstring.

ardumont mentioned this in rDWAPPS0e9c50fd9d9e: swh.web.common.utils: Adapt persistent identifier computations.

ardumont mentioned this in R65:0e9c50fd9d9e: swh.web.common.utils: Adapt persistent identifier computations.Feb 23 2019, 1:57 AM