diff --git a/docs/persistent-identifiers.rst b/docs/persistent-identifiers.rst --- a/docs/persistent-identifiers.rst +++ b/docs/persistent-identifiers.rst @@ -135,15 +135,16 @@ Contextual information ====================== -It is often useful to complement persistent identifiers with **contextual -information** about where the identified object has been found as well as which -specific parts of it are of interest. To that end it is possible, via a -dedicated syntax, to extend persistent identifiers with the following pieces of -information: - -* the **software origin** where an object has been found/observed -* the **line number(s)** of interest, usually within a content object +The Software Heritage persistent identifiers described above are *intrinsic identifiers*, as they are computed from the designated object itself, and it is often useful to provide *contextual information* about a particular +occurrence of the object, like the origin from where the object has been found. +To this end, persistent identifiers can be equipped with **qualifiers** that +contain this *contextual information*. Qualifiers come in different kinds : +* origin +* visit +* anchor +* path +* lines Syntax ------ @@ -153,32 +154,64 @@ .. code-block:: bnf - ::= [] [] - ::= ";" "lines" "=" ["-" ] + ::= [ ] + := [ ] + ::= | | | | ::= ";" "origin" "=" + ::= ";" "visit" "=" + ::= ";" "anchor" "=" + ::= ";" "path" "=" + ::= ";" "lines" "=" ["-" ] ::= + ::= (* RFC 3986 compliant URLs *) + ::= (* RFC 3986 compliant absolute file path, percent-encoded *) +Here ```` is a percent-encoded version of the ```` in `Section 3.3 of RFC 3986 `_ Semantics --------- -``;`` is used as separator between persistent identifiers and additional -optional contextual information. Each piece of contextual information is +``;`` is used as separator between persistent identifiers and the +optional contextual information qualifiers. Each contextual information qualifier is specified as a key/value pair, using ``=`` as a separator. The following piece of contextual information are supported: -* line numbers: it is possible to specify a single line number or a line range, - separating two numbers with ``-``. Note that line numbers are purely - indicative and are not meant to be stable, as in some degenerate cases - (e.g., text files which mix different types of line terminators) it is - impossible to resolve them unambiguously. - -* software origin: where a given object has been found or observed in the wild, - as the URI that was used by Software Heritage to ingest the object into the - archive - +* **origin** : the *software origin* where an object has been found or observed in the wild, + as an URI; +* **visit** : persistent identifier of a *snapshot* corresponding to a specific *visit* of a repository containing the designated object; +* **anchor** : a *designated node* in the Merkle DAG relative to which a *path to the object* is specified, + as a persistent identifier of a directory, a revision, a release or a snapshot; +* **path** : the *absolute file path*, from the *root directory* associated to the *anchor node*, to the object; + when the anchor denotes a directory or a revision, and almost always when it's a release, + the root directory is uniquely determined; when the anchor denotes a snapshot, the root + directory is the one associated to the branch pointed to by the ``HEAD`` symbolic reference, + and undefined if such a reference is missing; +* **lines** : *line number(s)* of interest, usually within a content object + +We recommend to equip identifiers meant to be shared with as many qualifiers as +possible. Redundant information should be omitted: for example, if the *visit* +is present, and the *path* is relative to the snapshot indicated there, then the +*anchor* qualifier is superfluous. + +Example +------- + +The following `fully qualified identifier `_ +denotes the lines 9 to 15 of a file content that +can be found at absolute path ``/Examples/SimpleFarm/simplefarm.ml`` from the root directory +of the revision ``swh:1:rev:2db189928c94d62a3b4757b3eec68f0a4d4113f0`` that is contained +in the snapshot ``swh:1:snp:d7f1b9eb7ccb596c2622c4780febaa02549830f9`` taken from +the origin ``https://gitorious.org/ocamlp3l/ocamlp3l_cvs.git``. + +.. code-block:: url + + swh:1:cnt:4d99d2d18326621ccdd70f5ea66c2e2ac236ad8b; + anchor=swh:1:rev:2db189928c94d62a3b4757b3eec68f0a4d4113f0; + path=/Examples/SimpleFarm/simplefarm.ml; + visit=swh:1:snp:d7f1b9eb7ccb596c2622c4780febaa02549830f9; + origin=https://gitorious.org/ocamlp3l/ocamlp3l_cvs.git; + lines=9-15 Resolution ==========