diff --git a/docs/images/mirror-architecture.svg b/docs/images/mirror-architecture.svg
new file mode 100644
index 0000000..1cabe68
--- /dev/null
+++ b/docs/images/mirror-architecture.svg
@@ -0,0 +1,2312 @@
+
+
diff --git a/docs/index.rst b/docs/index.rst
index b78fea5..8e6b906 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -1,174 +1,178 @@
.. _swh-docs:
Software Heritage - Development Documentation
=============================================
Getting started
---------------
* :ref:`getting-started` ← start here to get your own Software Heritage
platform running in less than 5 minutes, or
* :ref:`developer-setup` ← here to hack on the Software Heritage software
stack
Architecture
------------
* :ref:`architecture` ← go there to have a glimpse on the Software Heritage software
architecture
+* :ref:`mirror` ← go there to have learn what a Software Heritage mirror is and
+ how set up one
+
Components
----------
Here is brief overview of the most relevant software components in the Software
Heritage stack. Each component name is linked to the development documentation
of the corresponding Python module.
:ref:`swh.core `
low-level utilities and helpers used by almost all other modules in the
stack
:ref:`swh.dataset `
public datasets and periodic data dumps of the archive released by Software
Heritage
:ref:`swh.deposit `
push-based deposit of software artifacts to the archive
swh.docs
developer documentation (used to generate this doc you are reading)
:ref:`swh.fuse `
Virtual file system to browse the Software Heritage archive, based on
`FUSE `_
:ref:`swh.graph `
Fast, compressed, in-memory representation of the archive, with tooling to
generate and query it.
:ref:`swh.indexer `
tools and workers used to crawl the content of the archive and extract
derived information from any artifact stored in it
:ref:`swh.journal `
persistent logger of changes to the archive, with publish-subscribe support
:ref:`swh.lister `
collection of listers for all sorts of source code hosting and distribution
places (forges, distributions, package managers, etc.)
:ref:`swh.loader-core `
low-level loading utilities and helpers used by all other loaders
:ref:`swh.loader-git `
loader for `Git `_ repositories
:ref:`swh.loader-mercurial `
loader for `Mercurial `_ repositories
:ref:`swh.loader-svn `
loader for `Subversion `_ repositories
:ref:`swh.model `
implementation of the :ref:`data-model` to archive source code artifacts
:ref:`swh.objstorage `
content-addressable object storage
:ref:`swh.objstorage.replayer `
Object storage replication tool
:ref:`swh.scanner `
source code scanner to analyze code bases and compare them with source code
artifacts archived by Software Heritage
:ref:`swh.scheduler `
task manager for asynchronous/delayed tasks, used for recurrent (e.g.,
listing a forge, loading new stuff from a Git repository) and one-off
activities (e.g., loading a specific version of a source package)
:ref:`swh.search `
search engine for the archive
:ref:`swh.storage `
abstraction layer over the archive, allowing to access all stored source
code artifacts as well as their metadata
:ref:`swh.vault `
implementation of the vault service, allowing to retrieve parts of the
archive as self-contained bundles (e.g., individual releases, entire
repository snapshots, etc.)
:ref:`swh.web `
Web application(s) to browse the archive, for both interactive (HTML UI)
and mechanized (REST API) use
:ref:`swh.web.client `
Python client for :ref:`swh.web `
Dependencies
------------
The dependency relationships among the various modules are depicted below.
.. _py-deps-swh:
.. figure:: images/py-deps-swh.svg
:width: 1024px
:align: center
Dependencies among top-level Python modules (click to zoom).
Archive
-------
* :ref:`Archive ChangeLog `: notable changes to the archive
over time
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* `URLs index `_
* :ref:`search`
* :ref:`glossary`
.. ensure sphinx does not complain about index files not being included
.. toctree::
:maxdepth: 2
:caption: Contents:
:titlesonly:
:hidden:
architecture
getting-started
developer-setup
journal
+ mirror
API documentation
swh.core
swh.dataset
swh.deposit
swh.fuse
swh.graph
swh.indexer
swh.journal
swh.lister
swh.loader
swh.model
swh.objstorage
swh.scanner
swh.scheduler
swh.search
swh.storage
swh.vault
swh.web
swh.web.client
diff --git a/docs/mirror.rst b/docs/mirror.rst
new file mode 100644
index 0000000..145ab6c
--- /dev/null
+++ b/docs/mirror.rst
@@ -0,0 +1,138 @@
+.. highlight:: bash
+
+.. _mirror:
+
+Software Heritage Mirror
+========================
+
+Description
+-----------
+
+A mirror is a full copy of the |swh| Archive. A minimal copy consists in 2
+parts:
+
+- the graph storage (typically an instance of :ref:`swh.storage `),
+- the object storage (typically an instance of :ref:`swh.objstorage `).
+
+However, a usable mirror needs also to be accessible. As such, a proper mirror
+should also allow to:
+
+- navigate the copy of the archive using a web browser (typically using the
+ :ref:`the web application `),
+- retrieve data from the copy of the archive (typically using the :ref:`the
+ vault service `)
+
+A mirror is filled consuming data from the |swh| Kafka-based :ref:`journal
+` and retrieving the blob objects (file content) from the |swh|
+:ref:`object storage `.
+
+.. note:: A mirror of the |swh| Archive is not necessarly implemented using the
+ |swh| software stack. In this documentation however we will describe the
+ case of a mirror using the |swh| software stack.
+
+
+.. thumbnail:: images/mirror-architecture.svg
+
+ General view of the |swh| mirroring architecture.
+
+In this documentation, we will focus only on replication mechanisms using the
+software stack provided by |swh|. Setting up web services or other storage
+methods will not be covered here.
+
+
+Replicating the Graph Storage
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The replication of the graph is based on a journal using Kafka as event
+streaming platform.
+
+On the main Software Heritage side, every addition made to the graph consist in
+the insertion of a :ref:`data-model` object. This added object is also
+serialized as a msgpack_ bytestring which is used as value of a Kafka message
+in a topic dedicated to the object type.
+
+Topics for the main part of the |swh| :ref:`data-model` are:
+
+- `swh.journal.objects.content`
+- `swh.journal.objects.skipped_content`
+- `swh.journal.objects.directory`
+- `swh.journal.objects.revision`
+- `swh.journal.objects.release`
+- `swh.journal.objects.snapshot`
+- `swh.journal.objects.origin`
+- `swh.journal.objects.origin_visit`
+- `swh.journal.objects.origin_visit_status`
+
+In addition to these are a few topics for :ref:`extrinsic metadata
+`:
+
+- `swh.journal.objects.metadata_authority`
+- `swh.journal.objects.metadata_fetcher`
+- `swh.journal.objects.raw_extrinsic_metadata`
+
+
+In order to set up a mirror of the graph, one need to deploy a stack capable of
+retrieving all these topics and store their content relialably. For example a
+kafka cluster configured as a replica of the main kafka broker hoste by |swh|
+would do the job (albeit not in a very useful manner by itself).
+
+A more usable mirror can be set up using the :ref:`Storage `
+component with the help of the special service named `replayer` provided by the
+:doc:`apidoc/swh.storage.replay` module.
+.. TODO: replace this previous link by a link to the 'swh storage replay'
+ command once available, and ideally once
+ https://github.com/sphinx-doc/sphinx/issues/880 is fixed, but humm...
+
+Replicating the Object Storage
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+File content (blobs) are **not** embedded in messages of the
+`swh.journal.objects.content` Kafka topic. As these messages do not include the
+file content, another component must be in charge of replicating blob objects
+from the original Software Heritage Archive and inserted in the local object
+storage instance.
+
+The idea for this component is to have another `swh-journal` client that
+subscribe to the `swh.journal.objects.content` topic to get the stream of blob
+objects identifiers, then retrieve the blob object from Software Heritage's
+object storage and insert it in the local object storage.
+
+The proposed implementation for this component is called the :ref:`content
+replayer `.
+
+
+Installation
+------------
+
+If using the |swh| software stack to deploy a mirror, a number of
+|swh| software components must be installed.
+
+As shown in the architecture diagram above, one needs to have:
+
+- a database to store the graph of the |swh| Archive,
+- the :ref:`swh-storage` component,
+- an object storage solution (can be cloud based or on local filesystem like
+ ZFS pools),
+- the :ref:`swh-objstorage` component,
+- the :ref:`swh.storage.replay` service (part of the :ref:`swh-storage`
+ package)
+- the :ref:`swh.objstorage.replayer.replay` service (from the
+ :ref:`swh-objstorage-replayer` package).
+
+As this can be quite complex to set up properly, we provide a `docker-swarm
+`_ based deployment which is provided as
+a working example of the mirror stack:
+
+ https://forge.softwareheritage.org/source/swh-docker
+
+It is strongly recommended to start from there before planning a
+production-like deployment.
+
+See the `README
+`_
+file of the `swh-docker
+`_ repository for more
+detailed explanations.
+
+
+.. _msgpack: https://msgpack.org