Description
Related Objects
- Mentioned In
- T2767: Make the Slug header optional for the deposit server
T2757: the Slug header should not be mandatory
T2740: Update user documentation of deposit - Mentioned Here
- T2860: Change parameter `slug` in swh-deposit cli to `create_origin`
T2391: Document `--create-origin` on the swh-deposit client for origin creation
Event Timeline
I'm not sure it is feasible technically
It is.
The main change that needs to occur for this is server side, not cli side.
It's the server which expects the slug to be given (as an http request header).
The --slug from the cli just transforms it into the header when discussing with the server.
Now, there is also a change required in the cli side. If the --slug is not provided and there is an
external_id entry in the metadata xml file, then no need to generate the slug (the cli generates
an uuid if no slug is provided currently).
Can you verify if there is a check related to the coherence between the slug and external_identifier?
I do think that the change is client side, where the cli should extract the id from the metadata file and sen it as the header slug.
Is that possible?
Also, we are thinking of removing external_identifier altogether and use only the atom:id identifier in the metadata file. (I'll update here when this discussion is finalized on IRC)
also note that making the slug a MUST (server-side) is not valid w.r.t. the specs ("The client MAY supply a Slug header")
So this should not be handled client-side (the generation of the slug in swh/deposit/cli/client.py)
I also think this external_identifier should go away, the spec is rich (aka complicated) enough without we adding some layers :-)
also note that making the slug a MUST (server-side) is not valid w.r.t. the
specs ("The client MAY supply a Slug header")
yes, this part is not compliant with the sword spec (it was done so the deposit
could start being developed, there was no api update part at first...).
We need to push forward your proposal of the external_identifier (or slug
if your 2nd proposal below comes to be) within the metadata deposited by
clients then.
Because we need to have some ways of discriminating between deposit requests
(of type metadata or archive) that are creating new partial deposits
without any anterior history and other deposit requests which are creating new
deposit for the same "origin".
As far as i remember, the mandatory slug within the http headers is the one
allowing this.
To create a deposit, you post to /<collection>/ (<- there is no deposit id in
there). And we are using the external id to join correctly that information [1]
So saying that, i think i finally figure out what needs changing then (yeah)...
We need to allow providing the previous deposit id [2] as the historic deposit
id with the post collection api call. So we can actually know it's a new
deposit for an existing origin (because we want that, part of the deposit
requisite)
[1] Implementation wise, the synthetic revision created in the archive are
referencing the previous synthetic revision.
[2] or swhid? i think a core swhid won't work (clash could be possible on
directory swhid). A swhid with context though could work though!
I can't help but think that adding full fledged test around deposit scenarios
would help for this.
I also think this external_identifier should go away, the spec is rich (aka
complicated) enough without we adding some layers :-)
yes, then we need to unify everywhere on the slug term.
Closing this with the recent changes in the protocol:
https://docs.softwareheritage.org/devel/swh-deposit/specs/protocol-reference.html
see also: T2860