diff --git a/docs/persistent-identifiers.rst b/docs/persistent-identifiers.rst
--- a/docs/persistent-identifiers.rst
+++ b/docs/persistent-identifiers.rst
@@ -4,22 +4,29 @@
SoftWare Heritage persistent IDentifiers (SWHIDs)
=================================================
-**version 1.3, last modified 2020-04-28**
+**version 1.4, last modified 2020-04-30**
+
+.. contents::
+ :local:
+ :depth: 2
Overview
========
-You can point to objects present in the Software Heritage archive by the means
-of **SoftWare Heritage persistent IDentifiers**, or **SWHIDs** for short, that
-are guaranteed to remain stable (persistent) over time. Their syntax, meaning,
-and usage is described below. Note that they are identifiers and not URLs, even
-though URL-based resolvers for SWHIDs are also available.
+You can point to objects present in the `Software Heritage
+`_ `archive
+`_ by the means of **SoftWare Heritage
+persistent IDentifiers**, or **SWHIDs** for short, that are guaranteed to
+remain stable (persistent) over time. Their syntax, meaning, and usage is
+described below. Note that they are identifiers and not URLs, even though
+URL-based `resolvers`_ for SWHIDs are also available.
-A SWHID consists of two separate parts, a *core identifier* that can point to
-any software artifact (or "object") available in the Software Heritage archive,
-and an *optional list of qualifiers* that allows to specify the context where
-the object is meant to be seen, or point to a subpart of the object itself.
+A SWHID consists of two separate parts, a mandatory *core identifier* that can
+point to any software artifact (or "object") available in the Software Heritage
+archive, and an optional list of *qualifiers* that allows to specify the
+context where the object is meant to be seen and point to a subpart of the
+object itself.
Objects come in different types:
@@ -33,7 +40,8 @@
is embedded in its SWHID as described below. The intrinsic identifiers embedded
in SWHIDs are strong cryptographic hashes computed on the entire set of object
properties. Together, these identifiers form a `Merkle structure
-`_, specifically a Merkle DAG.
+`_, specifically a Merkle `DAG
+`_.
See the :ref:`Software Heritage data model ` for an overview of
object types and how they are linked together. See
@@ -42,23 +50,24 @@
The optional qualifiers are of two kinds:
-* *context qualifiers* carry information about the context where a given
- object is meant to be seen; this is particularly important, as the same object
- can be reached in the Merkle graph following different *paths* from different
- nodes (or *anchors*), and it may have been retrieved from different *origins*,
- that may evolve between different *visits*,
-* *fragment qualifiers* allow to pinpoint specific subparts of an object
+* **context qualifiers:** carry information about the context where a given
+ object is meant to be seen. This is particularly important, as the same
+ object can be reached in the Merkle graph following different *paths*
+ starting from different nodes (or *anchors*), and it may have been retrieved
+ from different *origins*, that may evolve between different *visits*
+* **fragment qualifiers:** allow to pinpoint specific subparts of an object
Syntax
-------
+======
-Syntactically, SWHIDs are generated by the ```` entry point of the
-grammar:
+Syntactically, SWHIDs are generated by the ```` entry point in the
+following grammar:
.. code-block:: bnf
- ::= [ ] ;
+ ::= [ ] ;
+
::= "swh" ":" ":" ":" ;
::= "1" ;
::=
@@ -71,7 +80,8 @@
::= 40 * ; (* intrinsic object id, as hex-encoded SHA1 *)
::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
::= | "a" | "b" | "c" | "d" | "e" | "f" ;
- := [ ] ;
+
+ := ";" [ ] ;
::=
|
@@ -82,14 +92,14 @@
|
|
;
- ::= ";" "origin" "=" ;
- ::= ";" "visit" "=" ;
- ::= ";" "anchor" "=" ;
- ::= ";" "path" "=" ;
- ::= ";" "lines" "=" ["-" ] ;
+ ::= "origin" "=" ;
+ ::= "visit" "=" ;
+ ::= "anchor" "=" ;
+ ::= "path" "=" ;
+ ::= "lines" "=" ["-" ] ;
::= + ;
- ::= (* RFC 3986 compliant URLs, percent-escaped *)
- ::= (* RFC 3986 compliant absolute file path, percent-escaped *)
+ ::= (* RFC 3987 IRI *)
+ ::= (* RFC 3987 absolute path *)
Where:
@@ -105,17 +115,18 @@
Semantics
----------
+=========
+
Core identifiers
-~~~~~~~~~~~~~~~~
+----------------
-``:`` is used as separator between the logical parts of core identifiers. The ``swh``
-prefix makes explicit that these identifiers are related to *SoftWare
+``:`` is used as separator between the logical parts of core identifiers. The
+``swh`` prefix makes explicit that these identifiers are related to *SoftWare
Heritage*. ``1`` (````) is the current version of this
-identifier *scheme*; future editions will use higher version numbers, possibly
-breaking backward compatibility (but without breaking the resolvability of
-SWHIDs that conform to previous versions of the scheme).
+identifier *scheme*. Future editions will use higher version numbers, possibly
+breaking backward compatibility, but without breaking the resolvability of
+SWHIDs that conform to previous versions of the scheme.
A SWHID points to a single object, whose type is explicitly captured by
````:
@@ -151,23 +162,27 @@
quotes), a space, the length of the content as decimal digits, a NULL byte,
and the actual content of the file.
+
Qualifiers
-~~~~~~~~~~
+----------
``;`` is used as separator between the core identifier and the optional
-qualifiers, and optional qualifiers. Each qualifier is specified as a
+qualifiers, as well as between qualifiers. Each qualifier is specified as a
key/value pair, using ``=`` as a separator.
The following *context qualifiers* are available:
-* **origin** : the *software origin* where an object has been found or observed
+* **origin:** the *software origin* where an object has been found or observed
in the wild, as an URI;
-* **visit** : the core identifier of a *snapshot* corresponding to a specific
+
+* **visit:** the core identifier of a *snapshot* corresponding to a specific
*visit* of a repository containing the designated object;
-* **anchor** : a *designated node* in the Merkle DAG relative to which a *path
+
+* **anchor:** a *designated node* in the Merkle DAG relative to which a *path
to the object* is specified, as the core identifier of a directory, a
revision, a release or a snapshot;
-* **path** : the *absolute file path*, from the *root directory* associated to
+
+* **path:** the *absolute file path*, from the *root directory* associated to
the *anchor node*, to the object; when the anchor denotes a directory or a
revision, and almost always when it's a release, the root directory is
uniquely determined; when the anchor denotes a snapshot, the root directory
@@ -176,7 +191,7 @@
The following *fragment qualifier* is available:
-* **lines** : *line number(s)* of interest, usually within a content object
+* **lines:** *line number(s)* of interest, usually within a content object
We recommend to equip identifiers meant to be shared with as many qualifiers as
possible. While qualifiers may be listed in any order, it is good practice to
@@ -186,44 +201,69 @@
there, then the *anchor* qualifier is superfluous; similarly, if the *path* is
empty, it may be omitted.
+
+Interoperability
+================
+
+
+URI scheme
+----------
+
+The ``swh`` URI scheme is registered at IANA for SWHIDs. The present documents
+constitutes the scheme specification for such URI scheme.
+
+
Git compatibility
-~~~~~~~~~~~~~~~~~
+-----------------
SWHIDs for contents, directories, revisions, and releases are, at present,
compatible with the `Git `_ way of `computing identifiers
`_ for its objects.
The ```` part of a SWHID for a content object is the Git blob
identifier of any file with the same content; for a revision it is the Git
-commit identifier for the same revision, etc. This is not the case for snapshot
-identifiers, as Git does not have a corresponding object type.
+commit identifier for the same revision, etc. This is not the case for
+snapshot identifiers, as Git does not have a corresponding object type.
Note that Git compatibility is incidental and is not guaranteed to be
maintained in future versions of this scheme (or Git).
Examples
---------
+========
+
Core identifiers
-~~~~~~~~~~~~~~~~
+----------------
* ``swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2`` points to the content
of a file containing the full text of the GPL3 license
+
* ``swh:1:dir:d198bc9d7a6bcf6db04f476d29314f157507d505`` points to a directory
containing the source code of the Darktable photography application as it was
at some point on 4 May 2017
+
* ``swh:1:rev:309cf2674ee7a0749978cf8265ab91a60aea0f7d`` points to a commit in
the development history of Darktable, dated 16 January 2017, that added
undo/redo supports for masks
+
* ``swh:1:rel:22ece559cc7cc2364edc5e5593d63ae8bd229f9f`` points to Darktable
release 2.3.0, dated 24 December 2016
+
* ``swh:1:snp:c7c108084bc0bf3d81436bf980b46e98bd338453`` points to a snapshot
of the entire Darktable Git repository taken on 4 May 2017 from GitHub
+
Identifiers with qualifiers
-~~~~~~~~~~~~~~~~~~~~~~~~~~~
+---------------------------
-* The following `fully qualified SWHID `_ denotes the lines 9 to 15 of a file content that can be found at absolute path ``/Examples/SimpleFarm/simplefarm.ml`` from the root directory of the revision ``swh:1:rev:2db189928c94d62a3b4757b3eec68f0a4d4113f0`` that is contained in the snapshot ``swh:1:snp:d7f1b9eb7ccb596c2622c4780febaa02549830f9`` taken from the origin ``https://gitorious.org/ocamlp3l/ocamlp3l_cvs.git``
+* The following `SWHID
+ `_
+ denotes the lines 9 to 15 of a file content that can be found at absolute
+ path ``/Examples/SimpleFarm/simplefarm.ml`` from the root directory of the
+ revision ``swh:1:rev:2db189928c94d62a3b4757b3eec68f0a4d4113f0`` that is
+ contained in the snapshot
+ ``swh:1:snp:d7f1b9eb7ccb596c2622c4780febaa02549830f9`` taken from the origin
+ ``https://gitorious.org/ocamlp3l/ocamlp3l_cvs.git``:
.. code-block:: url
@@ -234,8 +274,9 @@
path=/Examples/SimpleFarm/simplefarm.ml;
lines=9-15
-
-* This is an example of `a fully qualified SWHID with a percent escaped file path `_
+* Here is an example of a `SWHID
+ `_
+ with a file path that requires percent-escaping:
.. code-block:: url
@@ -246,11 +287,23 @@
path=/html/semantics/document-metadata/the-meta-element/pragma-directives/attr-meta-http-equiv-refresh/support/x%3Burl=foo/
-Computing and resolving SWHIDs
-==============================
+Implementation
+==============
+
+
+Computing
+---------
+
+An important property of any SWHID is that its core identifier is *intrinsic*:
+it can be *computed from the object itself*, without having to rely on any
+third party. An implementation of SWHID that allows to do so locally is the
+`swh identify `_
+tool, available from the `swh.model `_
+Python package under the GPL license.
-An important property of SWHIDs is that a core identifier is *intrinsic*: it can
-be *computed from the object itself* using the `swh-identify `_ utility, or equivalently using standard git tools.
+SWHIDs are also automatically computed by Software Heritage for all archived
+objects as part of its archival activity, and can be looked up via the project
+`Web interface `_.
This has various practical implications:
@@ -259,19 +312,26 @@
just compute the core identifier from the artefact itself, and check that it
is the same as the core identifier part of the SHWID
-* the core identifier of a software artifact can be computed *before* its archival on
- Software Heritage
+* the core identifier of a software artifact can be computed *before* its
+ archival on Software Heritage
+
Resolvers
---------
-SWHIDs can be resolved using the Software Heritage Web application (see
-:py:mod:`swh.web`). In particular, the **root endpoint** ``/`` can be given a
-SWHID and will lead to the browsing page of the corresponding object, like
-this: ``https://archive.softwareheritage.org/``.
-A **dedicated** ``/resolve`` **endpoint** of the HTTP API is also available to
-explicitly request SWHID resolution; see: :http:get:`/api/1/resolve/(swh_id)/`.
+Software Heritage resolver
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+SWHIDs can be resolved using the Software Heritage `Web interface
+`_. In particular, the **root endpoint**
+``/`` can be given a SWHID and will lead to the browsing page of the
+corresponding object, like this:
+``https://archive.softwareheritage.org/``.
+
+A **dedicated** ``/resolve`` **endpoint** of the Software Heritage `Web API
+`_ is also available to
+programmatically resolve SWHIDs; see: :http:get:`/api/1/resolve/(swh_id)/`.
Examples:
@@ -283,10 +343,11 @@
* ``_
* ``_
-External resolvers
-~~~~~~~~~~~~~~~~~~
-The following **independent resolvers** support resolution of SWHIDs:
+Third-party resolvers
+~~~~~~~~~~~~~~~~~~~~~
+
+The following **third party resolvers** support SWHID resolution:
* `Identifiers.org `_; see:
``_ (registry identifier `MIR:00000655
@@ -294,6 +355,10 @@
* `Name-to-Thing (N2T) `_
+Note that resolution via Identifiers.org currently only supports *core
+identifiers* due to `syntactic incompatibilities with qualifiers
+`_.
+
Examples:
* ``_
@@ -304,8 +369,6 @@
* ``_
* ``_
-Note that resolution via Identifiers.org currently only supports *core identifiers* due to `syntactic incompatibilities with qualifiers `_.
-
References
==========