diff --git a/docs/data-model.rst b/docs/data-model.rst
index fc1639d..e531b1a 100644
--- a/docs/data-model.rst
+++ b/docs/data-model.rst
@@ -1,257 +1,259 @@
+:orphan:
+
 .. _data-model:
 
 Data model
 ==========
 
 .. note:: The text below is adapted from §7 of the article `Software Heritage:
   Why and How to Preserve Software Source Code
   <https://hal.archives-ouvertes.fr/hal-01590958/>`_ (in proceedings of `iPRES
   2017 <https://ipres2017.jp/>`_, 14th International Conference on Digital
   Preservation, by Roberto Di Cosmo and Stefano Zacchiroli), which also
   provides a more general description of Software Heritage for the digital
   preservation research community.
 
 In any archival project the choice of the underlying data model—at the logical
 level, independently from how data is actually stored on physical media—is
 paramount. The data model adopted by Software Heritage to represent the
 information that it collects is centered around the notion of *software
 artifact*, described below.
 
 It is important to notice that according to our principles, we must store with
 every software artifact full information on where it has been found
 (provenance), that is also captured in our data model, so we start by providing
 some basic information on the nature of this provenance information.
 
 
 Source code hosting places
 --------------------------
 
 Currently, Software Heritage uses of a curated list of source code hosting
 places to crawl. The most common entries we expect to place in such a list are
 popular collaborative development forges (e.g., GitHub, Bitbucket), package
 manager repositories that host source package (e.g., CPAN, npm), and FOSS
 distributions (e.g., Fedora, FreeBSD). But we may of course allow also more
 niche entries, such as URLs of personal or institutional project collections
 not hosted on major forges.
 
 While currently entirely manual, the curation of such a list might easily be
 semi-automatic, with entries suggested by fellow archivists and/or concerned
 users that want to notify Software Heritage of the need of archiving specific
 pieces of endangered source code. This approach is entirely compatible with
 Web-wide crawling approaches: crawlers capable of detecting the presence of
 source code might enrich the list. In both cases the list will remain curated,
 with (semi-automated) review processes that will need to pass before a hosting
 place starts to be used.
 
 
 Software artifacts
 ------------------
 
 Once the hosting places are known, they will need to be periodically looked at
 in order to add to the archive missing software artifacts. Which software
 artifacts will be found there?
 
 In general, each software distribution mechanism hosts multiple releases of a
 given software at any given time. For VCS (Version Control Systems), this is
 the natural behaviour; for software packages, while a single version of a
 package is just a snapshot of the corresponding software product, one can often
 retrieve both current and past versions of the package from its distribution
 site.
 
 By reviewing and generalizing existing VCS and source package formats, we have
 identified the following recurrent artifacts as commonly found at source code
 hosting places. They form the basic ingredients of the Software Heritage
 archive. As the terminology varies quite a bit from technology to technology,
 we provide below both the canonical name used in Software Heritage and popular
 synonyms.
 
 **contents** (AKA "blobs")
   the raw content of (source code) files as a sequence of bytes, without file
   names or any other metadata.  File contents are often recurrent, e.g., across
   different versions of the same software, different directories of the same
   project, or different projects all together.
 
 **directories**
   a list of named directory entries, each of which pointing to other artifacts,
   usually file contents or sub-directories. Directory entries are also
   associated to arbitrary metadata, which vary with technologies, but usually
   includes permission bits, modification timestamps, etc.
 
 **revisions** (AKA "commits")
   software development within a specific project is essentially a time-indexed
   series of copies of a single "root" directory that contains the entire
   project source code. Software evolves when a developer modifies the content
   of one or more files in that directory and record their changes.
 
   Each recorded copy of the root directory is known as a "revision". It points
   to a fully-determined directory and is equipped with arbitrary metadata. Some
   of those are added manually by the developer (e.g., commit message), others
   are automatically synthesized (timestamps, preceding commit(s), etc).
 
 **releases** (AKA "tags")
   some revisions are more equals than others and get selected by developers as
   denoting important project milestones known as "releases". Each release
   points to the last commit in project history corresponding to the release and
   might carry arbitrary metadata—e.g., release name and version, release
   message, cryptographic signatures, etc.
 
 
 Additionally, the following crawling-related information are stored as
 provenance information in the Software Heritage archive:
 
 **origins**
   code "hosting places" as previously described are usually large platforms
   that host several unrelated software projects. For software provenance
   purposes it is important to be more specific than that.
 
   Software origins are fine grained references to where source code artifacts
   archived by Software Heritage have been retrieved from. They take the form of
   ``(type, url)`` pairs, where ``url`` is a canonical URL (e.g., the address at
   which one can ``git clone`` a repository or download a source tarball) and
   ``type`` the kind of software origin (e.g., git, svn, or dsc for Debian
   source packages).
 
 ..
    **projects**
      as commonly intended are more abstract entities that precise software
      origins. Projects relate together several development resources, including
      websites, issue trackers, mailing lists, as well as software origins as
      intended by Software Heritage.
 
      The debate around the most apt ontologies to capture project-related
      information for software hasn't settled yet, but the place projects will take
      in the Software Heritage archive is fairly clear. Projects are abstract
      entities, which will be arbitrarily nestable in a versioned
      project/sub-project hierarchy, and that can be associated to arbitrary
      metadata as well as origins where their source code can be found.
 
 **snapshots**
   any kind of software origin offers multiple pointers to the "current" state
   of a development project. In the case of VCS this is reflected by branches
   (e.g., master, development, but also so called feature branches dedicated to
   extending the software in a specific direction); in the case of package
   distributions by notions such as suites that correspond to different maturity
   levels of individual packages (e.g., stable, development, etc.).
 
   A "snapshot" of a given software origin records all entry points found there
   and where each of them was pointing at the time. For example, a snapshot
   object might track the commit where the master branch was pointing to at any
   given time, as well as the most recent release of a given package in the
   stable suite of a FOSS distribution.
 
 **visits**
   links together software origins with snapshots. Every time an origin is
   consulted a new visit object is created, recording when (according to
   Software Heritage clock) the visit happened and the full snapshot of the
   state of the software origin at the time.
 
 
 Data structure
 --------------
 
 .. _swh-merkle-dag:
 .. figure:: images/swh-merkle-dag.svg
    :width: 1024px
    :align: center
 
    Software Heritage archive as a Merkle DAG, augmented with crawling
    information (click to zoom).
 
 
 With all the bits of what we want to archive in place, the next question is how
 to organize them, i.e., which logical data structure to adopt for their
 storage. A key observation for this decision is that source code artifacts are
 massively duplicated. This is so for several reasons:
 
 * code hosting diaspora (i.e., project development moving to the most
   recent/cool collaborative development technology over time);
 * copy/paste (AKA "vendoring") of parts or entire external FOSS software
   components into other software products;
 * large overlap between revisions of the same project: usually only a very
   small amount of files/directories are modified by a single commit;
 * emergence of DVCS (distributed version control systems), which natively work
   by replicating entire repository copies around. GitHub-style pull requests
   are the pinnacle of this, as they result in creating an additional repository
   copy at each change done by a new developer;
 * migration from one VCS to another—e.g., migrations from Subversion to Git,
   which are really popular these days—resulting in additional copies, but in a
   different distribution format, of the very same development histories.
 
 These trends seem to be neither stopping nor slowing down, and it is reasonable
 to expect that they will be even more prominent in the future, due to the
 decreasing costs of storage and bandwidth.
 
 For this reason we argue that any sustainable storage layout for archiving
 source code in the very long term should support deduplication, allowing to pay
 for the cost of storing source code artifacts that are encountered more than
 once only once. For storage efficiency, deduplication should be supported for
 all the software artifacts we have discussed, namely: file contents,
 directories, revisions, releases, snapshots.
 
 Realizing that principle, the Software Heritage archive is conceptually a
 single (big) `Merkle Direct Acyclic Graph (DAG)
 <https://en.wikipedia.org/wiki/Merkle_tree>`_, as depicted in Figure
 :ref:`Software Heritage Merkle DAG <swh-merkle-dag>`. In such a graph each of
 the artifacts we have described—from file contents up to entire
 snapshots—correspond to a node.  Edges between nodes emerge naturally:
 directory entries point to other directories or file contents; revisions point
 to directories and previous revisions, releases point to revisions, snapshots
 point to revisions and releases. Additionally, each node contains all metadata
 that are specific to the node itself rather than to pointed nodes; e.g., commit
 messages, timestamps, or file names. Note that the structure is really a DAG,
 and not a tree, due to the fact that the line of revisions nodes might be
 forked and merged back.
 
 ..
    directory: fff3cc22cb40f71d26f736c082326e77de0b7692
    parent: e4feb05112588741b4764739d6da756c357e1f37
    author: Stefano Zacchiroli <zack@upsilon.cc>
    date: 1443617461 +0200
    committer: Stefano Zacchiroli <zack@upsilon.cc>
    commiter_date: 1443617461 +0200
    message:
      objstorage: fix tempfile race when adding objects
 
      Before this change, two workers adding the same
      object will end up racing to write <SHA1>.tmp.
      [...]
 
      revisionid: 64a783216c1ec69dcb267449c0bbf5e54f7c4d6d
      A revision node in the Software Heritage DAG
 
 In a Merkle structure each node is identified by an intrinsic identifier
 computed as a cryptographic hash of the node content. In the case of Software
 Heritage identifiers are computed taking into account both node-specific
 metadata and the identifiers of child nodes.
 
 Consider the revision node in the picture whose identifier starts with
 `c7640e08d..`. it points to a directory (identifier starting with
 `45f0c078..`), which has also been archived. That directory contains a full
 copy, at a specific point in time, of a software component—in the example the
 `Hello World <https://forge.softwareheritage.org/source/helloworld/>`_ software
 component available on our forge. The revision node also points to the
 preceding revision node (`43ef7dcd..`) in the project development history.
 Finally, the node contains revision-specific metadata, such as the author and
 committer of the given change, its timestamps, and the message entered by the
 author at commit time.
 
 The identifier of the revision node itself (`c7640e08d..`) is computed as a
 cryptographic hash of a (canonical representation of) all the information shown
 in figure. A change in any of them—metadata and/or pointed nodes—would result
 in an entirely different node identifier. All other types of nodes in the
 Software Heritage archive behave similarly.
 
 The Software Heritage archive inherits useful properties from the underlying
 Merkle structure. In particular, deduplication is built-in. Any software
 artifacts encountered in the wild gets added to the archive only if a
 corresponding node with a matching intrinsic identifier is not already
 available in the graph—file content, commits, entire directories or project
 snapshots are all deduplicated incurring storage costs only once.
 
 Furthermore, as a side effect of this data model choice, the entire development
 history of all the source code archived in Software Heritage—which ambitions to
 match all published source code in the world—is available as a unified whole,
 making emergent structures such as code reuse across different projects or
 software origins, readily available. Further reinforcing the Software Heritage
 use cases, this object could become a veritable "map of the stars" of our
 entire software commons.
diff --git a/docs/persistent-identifiers.rst b/docs/persistent-identifiers.rst
index a588e9d..b0651d0 100644
--- a/docs/persistent-identifiers.rst
+++ b/docs/persistent-identifiers.rst
@@ -1,239 +1,239 @@
+:orphan:
+
 .. _persistent-identifiers:
 
 Persistent identifiers
 ======================
 
 You can point to objects present in the Software Heritage archive by the means
 of **persistent identifiers** that are guaranteed to remain stable (persistent)
 over time. Their syntax, meaning, and usage is described below. Note that they
 are identifiers and not URLs, even though an URL-based resolver for Software
 Heritage persistent identifiers is also provided.
 
 A persistent identifier can point to any software artifact (or "object")
 available in the Software Heritage archive. Objects come in different types,
 and most notably:
 
 * contents
 * directories
 * revisions
 * releases
 * snapshots
 
 Each object is identified by an intrinsic, type-specific object identifier that
 is embedded in its persistent identifier as described below. Object identifiers
 are strong cryptographic hashes computed on the entire set of object properties
 to form a `Merkle structure <https://en.wikipedia.org/wiki/Merkle_tree>`_.
 
 See :ref:`data-model` for an overview of object types and how they are linked
 together. See :py:mod:`swh.model.identifiers` for details on how intrinsic
 object identifiers are computed.
 
 
 Syntax
 ------
 
 Syntactically, persistent identifiers are generated by the ``<identifier>``
 entry point of the grammar:
 
 .. code-block:: bnf
 
   <identifier> ::= "swh" ":" <scheme_version> ":" <object_type> ":" <object_id> ;
   <scheme_version> ::= "1" ;
   <object_type> ::=
       "snp"  (* snapshot *)
     | "rel"  (* release *)
     | "rev"  (* revision *)
     | "dir"  (* directory *)
     | "cnt"  (* content *)
     ;
   <object_id> ::= 40 * <hex_digit> ;  (* intrinsic object id, as hex-encoded SHA1 *)
   <dec_digit> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
   <hex_digit> ::= <dec_digit> | "a" | "b" | "c" | "d" | "e" | "f" ;
 
 
 Semantics
 ---------
 
 ``:`` is used as separator between the logical parts of identifiers. The
 ``swh`` prefix makes explicit that these identifiers are related to *SoftWare
 Heritage*. ``1`` (``<scheme_version>``) is the current version of this
 identifier *scheme*; future editions will use higher version numbers, possibly
 breaking backward compatibility (but without breaking the resolvability of
 identifiers that conform to previous versions of the scheme).
 
 A persistent identifier points to a single object, whose type is explicitly
 captured by ``<object_type>``:
 
 * ``snp`` identifiers points to **snapshots**,
 * ``rel`` to **releases**,
 * ``rev`` to **revisions**,
 * ``dir`` to **directories**,
 * ``cnt`` to **contents**.
 
 The actual object pointed to is identified by the intrinsic identifier
 ``<object_id>``, which is a hex-encoded (using lowercase ASCII characters) SHA1
 computed on the content and metadata of the object itself, as follows:
 
 * for **snapshots**, intrinsic identifiers are computed as per
   :py:func:`swh.model.identifiers.snapshot_identifier`
 
 * for **releases**, as per
   :py:func:`swh.model.identifiers.release_identifier`
 
 * for **revisions**, as per
   :py:func:`swh.model.identifiers.revision_identifier`
 
 * for **directories**, as per
   :py:func:`swh.model.identifiers.directory_identifier`
 
 * for **contents**, the intrinsic identifier is the ``sha1_git`` hash of the
   multiple hashes returned by
   :py:func:`swh.model.identifiers.content_identifier`, i.e., the SHA1 of a byte
   sequence obtained by juxtaposing the ASCII string ``"blob"`` (without
   quotes), a space, the length of the content as decimal digits, a NULL byte,
   and the actual content of the file.
 
 
 Git compatibility
 ~~~~~~~~~~~~~~~~~
 
 Intrinsic object identifiers for contents, directories, revisions, and releases
 are, at present, compatible with the `Git <https://git-scm.com/>`_ way of
 `computing identifiers
 <https://git-scm.com/book/en/v2/Git-Internals-Git-Objects>`_ for its objects.
 A Software Heritage content identifier will be identical to a Git blob
 identifier of any file with the same content, a Software Heritage revision
 identifier will be identical to the corresponding Git commit identifier, etc.
 This is not the case for snapshot identifiers as Git doesn't have a
 corresponding object type.
 
 Note that Git compatibility is incidental and is not guaranteed to be
 maintained in future versions of this scheme (or Git).
 
 
 Examples
 --------
 
 * ``swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2`` points to the content
   of a file containing the full text of the GPL3 license
 * ``swh:1:dir:d198bc9d7a6bcf6db04f476d29314f157507d505`` points to a directory
   containing the source code of the Darktable photography application as it was
   at some point on 4 May 2017
 * ``swh:1:rev:309cf2674ee7a0749978cf8265ab91a60aea0f7d`` points to a commit in
   the development history of Darktable, dated 16 January 2017, that added
   undo/redo supports for masks
 * ``swh:1:rel:22ece559cc7cc2364edc5e5593d63ae8bd229f9f`` points to Darktable
   release 2.3.0, dated 24 December 2016
 * ``swh:1:snp:c7c108084bc0bf3d81436bf980b46e98bd338453`` points to a snapshot
   of the entire Darktable Git repository taken on 4 May 2017 from GitHub
 
 
 Contextual information
 ======================
 
 It is often useful to complement persistent identifiers with **contextual
 information** about where the identified object has been found as well as which
 specific parts of it are of interest. To that end it is possible, via a
 dedicated syntax, to extend persistent identifiers with the following pieces of
 information:
 
 * the **software origin** where an object has been found/observed
 * the **line number(s)** of interest, usually within a content object
 
 
 Syntax
 ------
 
 The full-syntax to complement identifiers with contextual information is given
 by the ``<identifier_with_context>`` entry point of the grammar:
 
 .. code-block:: bnf
 
   <identifier_with_context> ::= <identifier> [<lines_ctxt>] [<origin_ctxt>]
   <lines_ctxt> ::= ";" "lines" "=" <line_number> ["-" <line_number>]
   <origin_ctxt> ::= ";" "origin" "=" <url>
   <line_number> ::= <dec_digit> +
   <url> ::= (* RFC 3986 compliant URLs *)
 
 
 Semantics
 ---------
 
 ``;`` is used as separator between persistent identifiers and additional
 optional contextual information. Each piece of contextual information is
 specified as a key/value pair, using ``=`` as a separator.
 
 The following piece of contextual information are supported:
 
 * line numbers: it is possible to specify a single line number or a line range,
   separating two numbers with ``-``. Note that line numbers are purely
   indicative and are not meant to be stable, as in some degenerate cases
   (e.g., text files which mix different types of line terminators) it is
   impossible to resolve them unambiguously.
 
 * software origin: where a given object has been found or observed in the wild,
   as the URI that was used by Software Heritage to ingest the object into the
   archive
 
 
 Resolution
 ==========
 
 
 Dedicated resolvers
 -------------------
 
 Persistent identifiers can be resolved using the Software Heritage Web
 application (see :py:mod:`swh.web`).  In particular, the **root endpoint**
 ``/`` can be given a persistent identifier and will lead to the browsing page
 of the corresponding object, like this:
 ``https://archive.softwareheritage.org/<identifier>``.
 
 A **dedicated** ``/resolve`` **endpoint** of the HTTP API is also available to
 explicitly request persistent identifier resolution; see:
 :http:get:`/api/1/resolve/(swh_id)/`.
 
 Examples:
 
 * `<https://archive.softwareheritage.org/swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2>`_
 * `<https://archive.softwareheritage.org/swh:1:dir:d198bc9d7a6bcf6db04f476d29314f157507d505>`_
 * `<https://archive.softwareheritage.org/api/1/resolve/swh:1:rev:309cf2674ee7a0749978cf8265ab91a60aea0f7d>`_
 * `<https://archive.softwareheritage.org/api/1/resolve/swh:1:rel:22ece559cc7cc2364edc5e5593d63ae8bd229f9f>`_
 * `<https://archive.softwareheritage.org/api/1/resolve/swh:1:snp:c7c108084bc0bf3d81436bf980b46e98bd338453>`_
 
 
 External resolvers
 ------------------
 
 The following **independent resolvers** support resolution of Software
 Heritage persistent identifiers:
 
 * `Identifiers.org <https://identifiers.org>`_; see:
   `<http://identifiers.org/swh/>`_ (registry identifier `MIR:00000655
   <https://www.ebi.ac.uk/miriam/main/datatypes/MIR:00000655>`_).
 
 * `Name-to-Thing (N2T) <https://n2t.net/>`_
 
 Examples:
 
 * `<https://identifiers.org/swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2>`_
 * `<https://identifiers.org/swh:1:dir:d198bc9d7a6bcf6db04f476d29314f157507d505>`_
 * `<https://identifiers.org/swh:1:rev:309cf2674ee7a0749978cf8265ab91a60aea0f7d>`_
 * `<https://n2t.net/swh:1:rel:22ece559cc7cc2364edc5e5593d63ae8bd229f9f>`_
 * `<https://n2t.net/swh:1:snp:c7c108084bc0bf3d81436bf980b46e98bd338453>`_
 
 Note that resolution via Identifiers.org does not support contextual
 information, due to `syntactic incompatibilities
 <http://identifiers.org/documentation#custom_requests>`_.
 
 
 References
 ==========
 
 * Roberto Di Cosmo, Morane Gruenpeter, Stefano Zacchiroli. `Identifiers for
   Digital Objects: the Case of Software Source Code Preservation
   <https://hal.archives-ouvertes.fr/hal-01865790v4>`_. In Proceedings of `iPRES
   2018 <https://ipres2018.org/>`_: 15th International Conference on Digital
   Preservation, Boston, MA, USA, September 2018, 9 pages.
-
-