Description
Revisions and Commits
Event Timeline
After this morning's meeting with @vlorentz and @ardumont:
We will keep the metadata-only deposit specs with the idea of a separate namespace swh for which we need to write the schema (not sure we have that).
This way, the xml with metadata has a section where the identified artifact is mentioned:
Reference a snapshot, revision or release:
With ${type} in {snp (snapshot), rev (revision), rel (release) }: <swh:deposit> <swh:reference> <swh:object id="swh:1:${type}:aaaaaaaaaaaaaa..."/> </swh:reference> </swh:deposit>
We need to add to the list of types: directory and content
The possibility to deposit metadata on an origin should be implemented as well, but is not suited for institutional repositories (e.g HAL).
Reference an origin:
<swh:deposit> <swh:reference> <swh:origin url="https://github.com/user/repo"/> </swh:reference> </swh:deposit>
This specs fits the POST of a new deposit in SWORD and is described in the SWORD v2 documentation (6.3.3. Creating a Resource with an Atom Entry)
@vlorentz can you please review the naming and the choice of the tag with or without the attribute (e.g id, url)?
I think we would want to "mention" SWHIDs there, by replacing <swh:object id=" with either <swh:swhid id=" or <swh:object swhid=" (weak preference for the latter)
Additionally, should the SWHID be a core SWHID, or do we allow context? In the latter case, what do we do if there's a line context?
I don't recall what the conclusion was about the proposal of <swh:swhid>$actual_swid</swh:swhid> which i found simpler and clearer.
(I have no clue if that proposal is irrelevant or not)
I guess a question which could help answering that also would be "Do we intend to add other attributes to <swh:object>"?
We didn't conclude anything, I said I'd think about it ;)
Since it's a simple text value, it should be an attribute, IMO. No point in allowing content in that tag
I see we have three-four options:
Option A1: value of swhid in argument id
<swhid id='swh:1:cnt:4d99d2d18326621ccdd70f5ea66c2e2ac236ad8b; origin=https://gitorious.org/ocamlp3l/ocamlp3l_cvs.git; visit=swh:1:snp:d7f1b9eb7ccb596c2622c4780febaa02549830f9; anchor=swh:1:rev:2db189928c94d62a3b4757b3eec68f0a4d4113f0; path=/Examples/SimpleFarm/simplefarm.ml; lines=9-15'/>
Option A2: value of swhid in argument swhid
<object swhid='swh:1:cnt:4d99d2d18326621ccdd70f5ea66c2e2ac236ad8b; origin=https://gitorious.org/ocamlp3l/ocamlp3l_cvs.git; visit=swh:1:snp:d7f1b9eb7ccb596c2622c4780febaa02549830f9; anchor=swh:1:rev:2db189928c94d62a3b4757b3eec68f0a4d4113f0; path=/Examples/SimpleFarm/simplefarm.ml; lines=9-15'/>
Option B: value of swhid in element
<swhid> swh:1:cnt:4d99d2d18326621ccdd70f5ea66c2e2ac236ad8b; origin=https://gitorious.org/ocamlp3l/ocamlp3l_cvs.git; visit=swh:1:snp:d7f1b9eb7ccb596c2622c4780febaa02549830f9; anchor=swh:1:rev:2db189928c94d62a3b4757b3eec68f0a4d4113f0; path=/Examples/SimpleFarm/simplefarm.ml; lines=9-15 </swhid>
Option C: Value of swhid separated in element
<swhid> <core> swh:1:cnt:4d99d2d18326621ccdd70f5ea66c2e2ac236ad8b</core> <origin_ctxt> https://gitorious.org/ocamlp3l/ocamlp3l_cvs.git </origin_ctxt> <visit_ctxt>swh:1:snp:d7f1b9eb7ccb596c2622c4780febaa02549830f9 </visit_ctxt> <anchor_ctxt>swh:1:rev:2db189928c94d62a3b4757b3eec68f0a4d4113f0 </anchor_ctxt> <path_ctxt>/Examples/SimpleFarm/simplefarm.ml </path_ctxt> <fragment_qualifier> 9-15</fragment_qualifier> </swhid>
I don't have a preference, but I do think that we don't want clients to dismember the SWHID into Option C.
So if we say that the burden of the understanding of the context is on our side, we should go with A or B.
@vlorentz is right when saying that the element is only text and is not a complexe element (where other elements are included).
@ardumont is right when saying that the use of only an element looks clearer, but we should use that only if there is a reason to include more elements in the identified object
So the questions are:
- do we think we will need that outside of the scenario we have seen yesterday (metadata-only deposit)?
- and do we think that on the long-term maybe option C will have a "raison d'être"?
- the evolution to json-ld will be easier with what schema?
We can use option A1, which allows extending to option C in the future if the need araises (but I doubt it will)
Actually, I prefer A2, to make the distinction between origins (identified by an URL, <swh:origin url=...) and objects (identified by a SWHID, <swh:object swhid='...)
Actually, I prefer A2, to make the distinction between origins (identified by an URL, <swh:origin url=...) and objects (identified by a SWHID, <swh:object swhid='...)
yes, described this way, A2 is more appealing ;)