In [1]: from swh.model.identifiers import parse_persistent_identifier In [2]: parse_persistent_identifier('foo') Out[2]: {'namespace': 'foo', 'scheme_version': {}}
Description
Revisions and Commits
Related Objects
- Mentioned In
- R65:0e9c50fd9d9e: swh.web.common.utils: Adapt persistent identifier computations
rDWAPPS0e9c50fd9d9e: swh.web.common.utils: Adapt persistent identifier computations
rDMODf2422d65c2b4: identifiers: Validate that inputs are correct
rDMODdfb128e9210b: swh.model.cli: Catch specific exception during identifiers check
rDMODb6073e27611e: identifiers: Permit to pass directly the object's id
rDMOD0d3a05141b81: identifiers: Raise when invalid contextual data in persistent id
rDMOD8a0cc22a73f9: identifiers: Make invalid persistent identifier parsing raise error
rDMOD7779c573e9c7: identifiers: Also validate the hash is correct
D348: swh.web.common.utils: Adapt persistent identifier computations
T1112: swh.model.identifier: Improve persistent identifier representation
D346: identifiers: Make invalid persistent identifier parsing raise error - Mentioned Here
- T1112: swh.model.identifier: Improve persistent identifier representation
Event Timeline
@ardumont I'm tentatively assigning this to you as I think the code is yours, but feel free to reassign as needed!
@ardumont I'm tentatively assigning this to you as I think the code is yours, but feel free to reassign as needed!
Now, now, for the me, the code is the team's code. So it's "mine" because i'm part of the team.
But, indeed, I initially authored it.
Regarding that code, i simply added the checks to make parsing errors raise (in accord with the current task).
I recall some remarks about the persistent identifier representation being too simple or something.
I don't know what's wrong with that simple representation as:
- the actual representation is also simple enough
- we don't have type anyway
- everyone can manipulate dict
Thanks for the enlightenment.
Note: I'm adding @anlambert in the loop as he is the one manipulating those the most.
He also helped improve the current implementation. So he might be interested by the discussion.
(a bit orthogonal to this specific task, but while we're at it)
here's what's wrong with dicts, in general:
- you cannot type (in the sense of type checking) them more precisely than "it's a dict", and if you end up having dicts everywhere (which is our case) that's not very useful
- they're prone to typo-based errors in their keys, both when accessing and when updating them
- they're mutable, so when you pass them around you just have to pray client code will not modify them
- accessing them requires hashing the key, which has a cost (often negligible, but still)
They're useful in a bunch of situations, like when the set of keys you use it's not close-ended (e.g., all the "metadata" situations we have in the archive).
But when the list of fields is fixed, which is the case for persistent identifiers, named tuples could be much better, and they address all the above issues.
Ok, thanks!
I suppose that by named tuples, you are talking about those [1] which i need to read further.
In the mean time, i created the T1112 about improving such representation.
[1] https://docs.python.org/3/library/collections.html#collections.namedtuple