Page MenuHomeSoftware Heritage

D3108.id.diff
No OneTemporary

D3108.id.diff

diff --git a/docs/persistent-identifiers.rst b/docs/persistent-identifiers.rst
--- a/docs/persistent-identifiers.rst
+++ b/docs/persistent-identifiers.rst
@@ -4,22 +4,29 @@
SoftWare Heritage persistent IDentifiers (SWHIDs)
=================================================
-**version 1.3, last modified 2020-04-28**
+**version 1.4, last modified 2020-04-30**
+
+.. contents::
+ :local:
+ :depth: 2
Overview
========
-You can point to objects present in the Software Heritage archive by the means
-of **SoftWare Heritage persistent IDentifiers**, or **SWHIDs** for short, that
-are guaranteed to remain stable (persistent) over time. Their syntax, meaning,
-and usage is described below. Note that they are identifiers and not URLs, even
-though URL-based resolvers for SWHIDs are also available.
+You can point to objects present in the `Software Heritage
+<https://www.softwareheritage.org/>`_ `archive
+<https://archive.softwareheritage.org/>`_ by the means of **SoftWare Heritage
+persistent IDentifiers**, or **SWHIDs** for short, that are guaranteed to
+remain stable (persistent) over time. Their syntax, meaning, and usage is
+described below. Note that they are identifiers and not URLs, even though
+URL-based `resolvers`_ for SWHIDs are also available.
-A SWHID consists of two separate parts, a *core identifier* that can point to
-any software artifact (or "object") available in the Software Heritage archive,
-and an *optional list of qualifiers* that allows to specify the context where
-the object is meant to be seen, or point to a subpart of the object itself.
+A SWHID consists of two separate parts, a mandatory *core identifier* that can
+point to any software artifact (or "object") available in the Software Heritage
+archive, and an optional list of *qualifiers* that allows to specify the
+context where the object is meant to be seen and point to a subpart of the
+object itself.
Objects come in different types:
@@ -33,7 +40,8 @@
is embedded in its SWHID as described below. The intrinsic identifiers embedded
in SWHIDs are strong cryptographic hashes computed on the entire set of object
properties. Together, these identifiers form a `Merkle structure
-<https://en.wikipedia.org/wiki/Merkle_tree>`_, specifically a Merkle DAG.
+<https://en.wikipedia.org/wiki/Merkle_tree>`_, specifically a Merkle `DAG
+<https://en.wikipedia.org/wiki/Directed_acyclic_graph>`_.
See the :ref:`Software Heritage data model <data-model>` for an overview of
object types and how they are linked together. See
@@ -42,23 +50,24 @@
The optional qualifiers are of two kinds:
-* *context qualifiers* carry information about the context where a given
- object is meant to be seen; this is particularly important, as the same object
- can be reached in the Merkle graph following different *paths* from different
- nodes (or *anchors*), and it may have been retrieved from different *origins*,
- that may evolve between different *visits*,
-* *fragment qualifiers* allow to pinpoint specific subparts of an object
+* **context qualifiers:** carry information about the context where a given
+ object is meant to be seen. This is particularly important, as the same
+ object can be reached in the Merkle graph following different *paths*
+ starting from different nodes (or *anchors*), and it may have been retrieved
+ from different *origins*, that may evolve between different *visits*
+* **fragment qualifiers:** allow to pinpoint specific subparts of an object
Syntax
-------
+======
-Syntactically, SWHIDs are generated by the ``<identifier>`` entry point of the
-grammar:
+Syntactically, SWHIDs are generated by the ``<identifier>`` entry point in the
+following grammar:
.. code-block:: bnf
- <identifier> ::= <identifier_core> [ <qualifierlist> ] ;
+ <identifier> ::= <identifier_core> [ <qualifiers> ] ;
+
<identifier_core> ::= "swh" ":" <scheme_version> ":" <object_type> ":" <object_id> ;
<scheme_version> ::= "1" ;
<object_type> ::=
@@ -71,7 +80,8 @@
<object_id> ::= 40 * <hex_digit> ; (* intrinsic object id, as hex-encoded SHA1 *)
<dec_digit> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
<hex_digit> ::= <dec_digit> | "a" | "b" | "c" | "d" | "e" | "f" ;
- <qualifierlist> := <qualifier> [ <qualifierlist> ] ;
+
+ <qualifiers> := ";" <qualifier> [ <qualifiers> ] ;
<qualifier> ::=
<context_qualifier>
| <fragment_qualifier>
@@ -82,14 +92,14 @@
| <anchor_ctxt>
| <path_ctxt>
;
- <origin_ctxt> ::= ";" "origin" "=" <url_escaped> ;
- <visit_ctxt> ::= ";" "visit" "=" <identifier_core> ;
- <anchor_ctxt> ::= ";" "anchor" "=" <identifier_core> ;
- <path_ctxt> ::= ";" "path" "=" <path_absolute_escaped> ;
- <fragment_qualifier> ::= ";" "lines" "=" <line_number> ["-" <line_number>] ;
+ <origin_ctxt> ::= "origin" "=" <url_escaped> ;
+ <visit_ctxt> ::= "visit" "=" <identifier_core> ;
+ <anchor_ctxt> ::= "anchor" "=" <identifier_core> ;
+ <path_ctxt> ::= "path" "=" <path_absolute_escaped> ;
+ <fragment_qualifier> ::= "lines" "=" <line_number> ["-" <line_number>] ;
<line_number> ::= <dec_digit> + ;
- <url_escaped> ::= (* RFC 3986 compliant URLs, percent-escaped *)
- <path_absolute_escaped> ::= (* RFC 3986 compliant absolute file path, percent-escaped *)
+ <url_escaped> ::= (* RFC 3987 IRI *)
+ <path_absolute_escaped> ::= (* RFC 3987 absolute path *)
Where:
@@ -105,17 +115,18 @@
Semantics
----------
+=========
+
Core identifiers
-~~~~~~~~~~~~~~~~
+----------------
-``:`` is used as separator between the logical parts of core identifiers. The ``swh``
-prefix makes explicit that these identifiers are related to *SoftWare
+``:`` is used as separator between the logical parts of core identifiers. The
+``swh`` prefix makes explicit that these identifiers are related to *SoftWare
Heritage*. ``1`` (``<scheme_version>``) is the current version of this
-identifier *scheme*; future editions will use higher version numbers, possibly
-breaking backward compatibility (but without breaking the resolvability of
-SWHIDs that conform to previous versions of the scheme).
+identifier *scheme*. Future editions will use higher version numbers, possibly
+breaking backward compatibility, but without breaking the resolvability of
+SWHIDs that conform to previous versions of the scheme.
A SWHID points to a single object, whose type is explicitly captured by
``<object_type>``:
@@ -151,23 +162,27 @@
quotes), a space, the length of the content as decimal digits, a NULL byte,
and the actual content of the file.
+
Qualifiers
-~~~~~~~~~~
+----------
``;`` is used as separator between the core identifier and the optional
-qualifiers, and optional qualifiers. Each qualifier is specified as a
+qualifiers, as well as between qualifiers. Each qualifier is specified as a
key/value pair, using ``=`` as a separator.
The following *context qualifiers* are available:
-* **origin** : the *software origin* where an object has been found or observed
+* **origin:** the *software origin* where an object has been found or observed
in the wild, as an URI;
-* **visit** : the core identifier of a *snapshot* corresponding to a specific
+
+* **visit:** the core identifier of a *snapshot* corresponding to a specific
*visit* of a repository containing the designated object;
-* **anchor** : a *designated node* in the Merkle DAG relative to which a *path
+
+* **anchor:** a *designated node* in the Merkle DAG relative to which a *path
to the object* is specified, as the core identifier of a directory, a
revision, a release or a snapshot;
-* **path** : the *absolute file path*, from the *root directory* associated to
+
+* **path:** the *absolute file path*, from the *root directory* associated to
the *anchor node*, to the object; when the anchor denotes a directory or a
revision, and almost always when it's a release, the root directory is
uniquely determined; when the anchor denotes a snapshot, the root directory
@@ -176,7 +191,7 @@
The following *fragment qualifier* is available:
-* **lines** : *line number(s)* of interest, usually within a content object
+* **lines:** *line number(s)* of interest, usually within a content object
We recommend to equip identifiers meant to be shared with as many qualifiers as
possible. While qualifiers may be listed in any order, it is good practice to
@@ -186,44 +201,69 @@
there, then the *anchor* qualifier is superfluous; similarly, if the *path* is
empty, it may be omitted.
+
+Interoperability
+================
+
+
+URI scheme
+----------
+
+The ``swh`` URI scheme is registered at IANA for SWHIDs. The present documents
+constitutes the scheme specification for such URI scheme.
+
+
Git compatibility
-~~~~~~~~~~~~~~~~~
+-----------------
SWHIDs for contents, directories, revisions, and releases are, at present,
compatible with the `Git <https://git-scm.com/>`_ way of `computing identifiers
<https://git-scm.com/book/en/v2/Git-Internals-Git-Objects>`_ for its objects.
The ``<object_id>`` part of a SWHID for a content object is the Git blob
identifier of any file with the same content; for a revision it is the Git
-commit identifier for the same revision, etc. This is not the case for snapshot
-identifiers, as Git does not have a corresponding object type.
+commit identifier for the same revision, etc. This is not the case for
+snapshot identifiers, as Git does not have a corresponding object type.
Note that Git compatibility is incidental and is not guaranteed to be
maintained in future versions of this scheme (or Git).
Examples
---------
+========
+
Core identifiers
-~~~~~~~~~~~~~~~~
+----------------
* ``swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2`` points to the content
of a file containing the full text of the GPL3 license
+
* ``swh:1:dir:d198bc9d7a6bcf6db04f476d29314f157507d505`` points to a directory
containing the source code of the Darktable photography application as it was
at some point on 4 May 2017
+
* ``swh:1:rev:309cf2674ee7a0749978cf8265ab91a60aea0f7d`` points to a commit in
the development history of Darktable, dated 16 January 2017, that added
undo/redo supports for masks
+
* ``swh:1:rel:22ece559cc7cc2364edc5e5593d63ae8bd229f9f`` points to Darktable
release 2.3.0, dated 24 December 2016
+
* ``swh:1:snp:c7c108084bc0bf3d81436bf980b46e98bd338453`` points to a snapshot
of the entire Darktable Git repository taken on 4 May 2017 from GitHub
+
Identifiers with qualifiers
-~~~~~~~~~~~~~~~~~~~~~~~~~~~
+---------------------------
-* The following `fully qualified SWHID <https://archive.softwareheritage.org/swh:1:cnt:4d99d2d18326621ccdd70f5ea66c2e2ac236ad8b;;origin=https://gitorious.org/ocamlp3l/ocamlp3l_cvs.git;visit=swh:1:snp:d7f1b9eb7ccb596c2622c4780febaa02549830f9;anchor=swh:1:rev:2db189928c94d62a3b4757b3eec68f0a4d4113f0;path=/Examples/SimpleFarm/simplefarm.ml;lines=9-15>`_ denotes the lines 9 to 15 of a file content that can be found at absolute path ``/Examples/SimpleFarm/simplefarm.ml`` from the root directory of the revision ``swh:1:rev:2db189928c94d62a3b4757b3eec68f0a4d4113f0`` that is contained in the snapshot ``swh:1:snp:d7f1b9eb7ccb596c2622c4780febaa02549830f9`` taken from the origin ``https://gitorious.org/ocamlp3l/ocamlp3l_cvs.git``
+* The following `SWHID
+ <https://archive.softwareheritage.org/swh:1:cnt:4d99d2d18326621ccdd70f5ea66c2e2ac236ad8b;;origin=https://gitorious.org/ocamlp3l/ocamlp3l_cvs.git;visit=swh:1:snp:d7f1b9eb7ccb596c2622c4780febaa02549830f9;anchor=swh:1:rev:2db189928c94d62a3b4757b3eec68f0a4d4113f0;path=/Examples/SimpleFarm/simplefarm.ml;lines=9-15>`_
+ denotes the lines 9 to 15 of a file content that can be found at absolute
+ path ``/Examples/SimpleFarm/simplefarm.ml`` from the root directory of the
+ revision ``swh:1:rev:2db189928c94d62a3b4757b3eec68f0a4d4113f0`` that is
+ contained in the snapshot
+ ``swh:1:snp:d7f1b9eb7ccb596c2622c4780febaa02549830f9`` taken from the origin
+ ``https://gitorious.org/ocamlp3l/ocamlp3l_cvs.git``:
.. code-block:: url
@@ -234,8 +274,9 @@
path=/Examples/SimpleFarm/simplefarm.ml;
lines=9-15
-
-* This is an example of `a fully qualified SWHID with a percent escaped file path <https://archive.softwareheritage.org/swh:1:cnt:f10371aa7b8ccabca8479196d6cd640676fd4a04;origin=https://github.com/web-platform-tests/wpt;visit=swh:1:snp:b37d435721bbd450624165f334724e3585346499;anchor=swh:1:rev:259d0612af038d14f2cd889a14a3adb6c9e96d96;path=/html/semantics/document-metadata/the-meta-element/pragma-directives/attr-meta-http-equiv-refresh/support/x%3Burl=foo/>`_
+* Here is an example of a `SWHID
+ <https://archive.softwareheritage.org/swh:1:cnt:f10371aa7b8ccabca8479196d6cd640676fd4a04;origin=https://github.com/web-platform-tests/wpt;visit=swh:1:snp:b37d435721bbd450624165f334724e3585346499;anchor=swh:1:rev:259d0612af038d14f2cd889a14a3adb6c9e96d96;path=/html/semantics/document-metadata/the-meta-element/pragma-directives/attr-meta-http-equiv-refresh/support/x%3Burl=foo/>`_
+ with a file path that requires percent-escaping:
.. code-block:: url
@@ -246,11 +287,23 @@
path=/html/semantics/document-metadata/the-meta-element/pragma-directives/attr-meta-http-equiv-refresh/support/x%3Burl=foo/
-Computing and resolving SWHIDs
-==============================
+Implementation
+==============
+
+
+Computing
+---------
+
+An important property of any SWHID is that its core identifier is *intrinsic*:
+it can be *computed from the object itself*, without having to rely on any
+third party. An implementation of SWHID that allows to do so locally is the
+`swh identify <https://docs.softwareheritage.org/devel/swh-model/cli.html>`_
+tool, available from the `swh.model <https://pypi.org/project/swh.model/>`_
+Python package under the GPL license.
-An important property of SWHIDs is that a core identifier is *intrinsic*: it can
-be *computed from the object itself* using the `swh-identify <https://docs.softwareheritage.org/devel/swh-model/cli.html>`_ utility, or equivalently using standard git tools.
+SWHIDs are also automatically computed by Software Heritage for all archived
+objects as part of its archival activity, and can be looked up via the project
+`Web interface <https://archive.softwareheritage.org>`_.
This has various practical implications:
@@ -259,19 +312,26 @@
just compute the core identifier from the artefact itself, and check that it
is the same as the core identifier part of the SHWID
-* the core identifier of a software artifact can be computed *before* its archival on
- Software Heritage
+* the core identifier of a software artifact can be computed *before* its
+ archival on Software Heritage
+
Resolvers
---------
-SWHIDs can be resolved using the Software Heritage Web application (see
-:py:mod:`swh.web`). In particular, the **root endpoint** ``/`` can be given a
-SWHID and will lead to the browsing page of the corresponding object, like
-this: ``https://archive.softwareheritage.org/<identifier>``.
-A **dedicated** ``/resolve`` **endpoint** of the HTTP API is also available to
-explicitly request SWHID resolution; see: :http:get:`/api/1/resolve/(swh_id)/`.
+Software Heritage resolver
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+SWHIDs can be resolved using the Software Heritage `Web interface
+<https://archive.softwareheritage.org>`_. In particular, the **root endpoint**
+``/`` can be given a SWHID and will lead to the browsing page of the
+corresponding object, like this:
+``https://archive.softwareheritage.org/<identifier>``.
+
+A **dedicated** ``/resolve`` **endpoint** of the Software Heritage `Web API
+<https://archive.softwareheritage.org/api/>`_ is also available to
+programmatically resolve SWHIDs; see: :http:get:`/api/1/resolve/(swh_id)/`.
Examples:
@@ -283,10 +343,11 @@
* `<https://archive.softwareheritage.org/swh:1:cnt:4d99d2d18326621ccdd70f5ea66c2e2ac236ad8b;;origin=https://gitorious.org/ocamlp3l/ocamlp3l_cvs.git;visit=swh:1:snp:d7f1b9eb7ccb596c2622c4780febaa02549830f9;anchor=swh:1:rev:2db189928c94d62a3b4757b3eec68f0a4d4113f0;path=/Examples/SimpleFarm/simplefarm.ml;lines=9-15>`_
* `<https://archive.softwareheritage.org/swh:1:cnt:f10371aa7b8ccabca8479196d6cd640676fd4a04;origin=https://github.com/web-platform-tests/wpt;visit=swh:1:snp:b37d435721bbd450624165f334724e3585346499;anchor=swh:1:rev:259d0612af038d14f2cd889a14a3adb6c9e96d96;path=/html/semantics/document-metadata/the-meta-element/pragma-directives/attr-meta-http-equiv-refresh/support/x%3Burl=foo/>`_
-External resolvers
-~~~~~~~~~~~~~~~~~~
-The following **independent resolvers** support resolution of SWHIDs:
+Third-party resolvers
+~~~~~~~~~~~~~~~~~~~~~
+
+The following **third party resolvers** support SWHID resolution:
* `Identifiers.org <https://identifiers.org>`_; see:
`<http://identifiers.org/swh/>`_ (registry identifier `MIR:00000655
@@ -294,6 +355,10 @@
* `Name-to-Thing (N2T) <https://n2t.net/>`_
+Note that resolution via Identifiers.org currently only supports *core
+identifiers* due to `syntactic incompatibilities with qualifiers
+<http://identifiers.org/documentation#custom_requests>`_.
+
Examples:
* `<https://identifiers.org/swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2>`_
@@ -304,8 +369,6 @@
* `<https://n2t.net/swh:1:cnt:4d99d2d18326621ccdd70f5ea66c2e2ac236ad8b;;origin=https://gitorious.org/ocamlp3l/ocamlp3l_cvs.git;visit=swh:1:snp:d7f1b9eb7ccb596c2622c4780febaa02549830f9;anchor=swh:1:rev:2db189928c94d62a3b4757b3eec68f0a4d4113f0;path=/Examples/SimpleFarm/simplefarm.ml;lines=9-15>`_
* `<https://n2t.net/swh:1:cnt:f10371aa7b8ccabca8479196d6cd640676fd4a04;origin=https://github.com/web-platform-tests/wpt;visit=swh:1:snp:b37d435721bbd450624165f334724e3585346499;anchor=swh:1:rev:259d0612af038d14f2cd889a14a3adb6c9e96d96;path=/html/semantics/document-metadata/the-meta-element/pragma-directives/attr-meta-http-equiv-refresh/support/x%3Burl=foo/>`_
-Note that resolution via Identifiers.org currently only supports *core identifiers* due to `syntactic incompatibilities with qualifiers <http://identifiers.org/documentation#custom_requests>`_.
-
References
==========

File Metadata

Mime Type
text/plain
Expires
Thu, Jan 23, 2:30 AM (19 h, 55 m)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3219543

Event Timeline