diff --git a/docs/architecture.rst b/docs/architecture.rst
new file mode 100644
--- /dev/null
+++ b/docs/architecture.rst
@@ -0,0 +1,74 @@
+.. _architecture:
+
+Software Architecture
+=====================
+
+From an end-user point of view, the |swh| platform consists in the
+:term:`archive`, which can be accessed using the web interface or its REST API.
+Behind the scene (and the web app) are several components that expose
+different aspects of the |swh| :term:`archive` as internal REST APIs.
+
+Each of these internal APIs have a dedicated (Postgresql) database.
+
+A global view of this architecture looks like:
+
+.. figure:: images/general-architecture.*
+
+   General view of the |swh| architecture.
+
+The front API components are:
+
+- :ref:`Storage API <swh-storage>`
+- :ref:`Deposit API <swh-deposit>`
+- :ref:`Vault API <swh-vault>`
+- :ref:`Indexer API <swh-indexer>`
+- :ref:`Scheduler API <swh-scheduler>`
+
+On the back stage of this show, a celery_ based game of tasks and workers
+occurs to perform all the required work to fill, maintain and update the |swh|
+:term:`archive`.
+
+The main components involved in this choreography are:
+
+- :term:`Listers <lister>`: a lister is a type of task aiming at scrapping a
+  web site, a forge, etc. to gather all the source code repositories it can
+  find. For each found source code repository, a :term:`loader` task is
+  created.
+
+- :term:`Loaders <loader>`: a loader is a type of task aiming at importing or
+  updating a source code repository. It is the one that inserts :term:`blob`
+  objects in the :term:`object storage`, and inserts nodes and edges in the
+  :ref:`graph <swh-merkle-dag>`.
+
+- :term:`Indexers <indexer>`: an indexer is a type of task aiming at crawling
+  the content of the :term:`archive` to extract derived information (mimetype,
+  etc.)
+
+
+Tasks
+-----
+
+The following sequence diagram shows the interactions between these components
+when a new forge needs to be archived. This example depicts the case of a
+gitlab_ forge, but any other supported source type would be very similar.
+
+.. mermaid:: tasks-lister.mmd
+
+As one might observe in this diagram, it does create two things:
+
+- it adds one :term:`origin` objects in the :term:`storage` database for each
+  source code repository, and
+
+- it insert one :term:`loader` task for each source code repository that will
+  be in charge of importing the content of that repository.
+
+
+The sequence diagram below describe this second step of importing the content
+of a repository. Once again, we take the example of a git repository, but any
+other type of repository would be very similar.
+
+.. mermaid:: tasks-git-loader.mmd
+
+
+.. _celery: https://www.celeryproject.org
+.. _gitlab: https://gitlab.com
diff --git a/docs/getting-started.rst b/docs/getting-started.rst
--- a/docs/getting-started.rst
+++ b/docs/getting-started.rst
@@ -119,7 +119,7 @@
 
 Then you will need a local storage service that will archive and serve source
 code artifacts via a REST API. The Software Heritage storage layer comes in two
-parts: a content-addressable object storage on your file system (for file
+parts: a content-addressable :term:`object storage` on your file system (for file
 contents) and a Postgres database (for the graph structure of the archive). See
 the :ref:`data-model` for more information. The storage layer is configured via
 a YAML configuration file, located at
@@ -137,13 +137,13 @@
           root: /srv/softwareheritage/objects/
           slicing: 0:2/2:4
 
-Make sure that the object storage root exists on the filesystem and is writable
+Make sure that the :term:`object storage` root exists on the filesystem and is writable
 to your user, e.g.::
 
   sudo mkdir -p /srv/softwareheritage/objects
   sudo chown "${USER}:" /srv/softwareheritage/objects
 
-You are done with object storage setup! Let's setup the database::
+You are done with :term:`object storage` setup! Let's setup the database::
 
   swh-db-init storage -d softwareheritage-dev
 
diff --git a/docs/glossary.rst b/docs/glossary.rst
new file mode 100644
--- /dev/null
+++ b/docs/glossary.rst
@@ -0,0 +1,158 @@
+:orphan:
+
+.. _glossary:
+
+Glossary
+========
+
+.. glossary::
+
+   archive
+
+     An instance of the |swh| data store.
+
+   archiver
+
+     A component dedicated at replicating an :term:`archive`.
+
+   ark
+
+     `Archival Resource Key`_ (ARK) is a Uniform Resource Locator (URL) that is
+     a multi-purpose persistent identifier for information objects of any type.
+
+   artifact
+   software artifact
+
+     An artifact is one of many kinds of tangible by-products produced during
+     the development of software.
+
+   content
+   blob
+
+     A (specific version of a) file stored in the archive, identified by its
+     cryptographic hashes (SHA1, "git-like" SHA1, SHA256) and its size. Also
+     known as: :term:`blob`. Note: it is incorrect to refer to Contents as
+     "files", because files are usually considered to be named, whereas
+     Contents are nameless. It is only in the context of specific
+     :term:`directories <directory>` that :term:`contents <content>` acquire
+     (local) names.
+
+   directory
+
+     A set of named pointers to contents (file entries), directories (directory
+     entries) and revisions (revision entries). All entries are associated to
+     the local name of the entry (i.e., a relative path without any path
+     separator) and permission metadata (e.g., ``chmod`` value or equivalent).
+
+   doi
+
+     A Digital Object Identifier or DOI_ is a persistent identifier or handle
+     used to uniquely identify objects, standardized by the International
+     Organization for Standardization (ISO).
+
+   journal
+
+     The journal_ is the persistent logger of the |swh| architecture in charge
+     of logging changes of the archive, with publish-subscribe_ support.
+
+   lister
+
+     A lister_ is a component of the |swh| architecture that is in charge of
+     enumerating the :term:`software origin` (e.g., VCS, packages, etc.)
+     available at a source code distribution place.
+
+   loader
+
+     A loader_ is a component of the |swh| architecture responsible for
+
+   hash
+   cryptographic hash
+   checksum
+   digest
+
+     A fixed-size "summary" of a stream of bytes that is easy to compute, and
+     hard to reverse. (Cryptographic hash function Wikipedia article) also
+     known as: :term:`checksum`, :term:`digest`.
+
+   indexer
+
+     A component of the |swh| architecture dedicated to producing metadata
+     linked to the known :term:`blobs <blob>` in the :term:`archive`.
+
+   objstore
+   objstorage
+   object store
+   object storage
+
+     Content-addressable object storage. It is the place where actual object
+     :term:`blobs <blob>` objects are stored.
+
+   origin
+   software origin
+   data source
+
+     A location from which a coherent set of sources has been obtained, like a
+     git repository, a directory containing tarballs, etc.
+
+   person
+
+     An entity referenced by a revision as either the author or the committer
+     of the corresponding change. A person is associated to a full name and/or
+     an email address.
+
+   release
+   tag
+   milestone
+
+     a revision that has been marked as noteworthy with a specific name (e.g.,
+     a version number), together with associated development metadata (e.g.,
+     author, timestamp, etc).
+
+   revision
+   commit
+   changeset
+
+     A point in time snapshot of the content of a directory, together with
+     associated development metadata (e.g., author, timestamp, log message,
+     etc).
+
+   scheduler
+
+     The component of the |swh| architecture dedicated to the management and
+     the prioritization of the many tasks.
+
+   snapshot
+
+     the state of all visible branches during a specific visit of an origin
+
+   type of origin
+
+     Information about the kind of hosting, e.g., whether it is a forge, a
+     collection of repositories, an homepage publishing tarball, or a one shot
+     source code repository. For all kind of repositories please specify which
+     VCS system is in use (Git, SVN, CVS, etc.) object.
+
+   vault
+   vault service
+
+     User-facing service that allows to retrieve parts of the :term:`archive`
+     as self-contained bundles (e.g., individual releases, entire repository
+     snapshots, etc.)
+
+   visit
+
+     The passage of |swh| on a given :term:`origin`, to retrieve all source
+     code and metadata available there at the time. A visit object stores the
+     state of all visible branches (if any) available at the origin at visit
+     time; each of them points to a revision object in the archive. Future
+     visits of the same origin will create new visit objects, without removing
+     previous ones.
+
+
+
+.. _blob: https://en.wikipedia.org/wiki/Binary_large_object
+.. _DOI: https://www.doi.org
+.. _`persistent identifier`: https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html#persistent-identifiers
+.. _`Archival Resource Key`: http://n2t.net/e/ark_ids.html
+.. _lister: https://docs.softwareheritage.org/devel/swh-lister/index.html
+.. _publish-subscribe: https://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern
diff --git a/docs/index.rst b/docs/index.rst
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -15,6 +15,13 @@
   stack
 
 
+Architecture
+------------
+
+* :ref:`architecture` ← go there to have a glimpse on the Software Heritage software
+  architecture
+
+
 Components
 ----------
 
@@ -116,6 +123,7 @@
 * :ref:`modindex`
 * `URLs index <http-routingtable.html>`_
 * :ref:`search`
+* :ref:`glossary`
 
 
 .. ensure sphinx does not complain about index files not being included
@@ -124,5 +132,6 @@
    :hidden:
    :glob:
 
+   architecture
    getting-started
    swh-*/index
diff --git a/docs/swh_substitutions b/docs/swh_substitutions
new file mode 100644
--- /dev/null
+++ b/docs/swh_substitutions
@@ -0,0 +1 @@
+.. |swh| replace:: *Software Heritage*
diff --git a/docs/tasks-git-loader.mmd b/docs/tasks-git-loader.mmd
new file mode 100644
--- /dev/null
+++ b/docs/tasks-git-loader.mmd
@@ -0,0 +1,63 @@
+sequenceDiagram
+  participant SCH_DB as scheduler DB
+  participant SCH_RUN as scheduler runner
+  participant SCH_LS as scheduler listener
+  participant RMQ as Rabbit-MQ
+  participant OBJSTORE as object storage
+  participant STORAGE_DB as storage DB
+  participant STORAGE_API as storage API
+  participant WORK_GIT as worker@loader-git
+  participant GIT as git server
+
+  Note over SCH_DB,RMQ: Task T2 created beforehand by the lister-gitlab task
+  loop Polling
+      SCH_RUN->>SCH_DB: GET TASK set state=scheduled
+      SCH_DB-->>SCH_RUN: TASK id=T2
+	  activate SCH_RUN
+      SCH_RUN->>RMQ: CREATE Celery Task CT2 loader-git
+	  deactivate SCH_RUN
+	  activate RMQ
+  end
+
+  RMQ->>+WORK_GIT: Start task CT2
+  deactivate RMQ
+
+  WORK_GIT->>+STORAGE_API: GET origin state
+  STORAGE_API-->>-WORK_GIT: 200
+
+  WORK_GIT->>+GIT: GET refs
+  GIT->>-WORK_GIT: 200 / refs
+
+  WORK_GIT->>+GIT: GET new_objects
+  GIT->>-WORK_GIT: 200 / objects
+
+  WORK_GIT->>+GIT: PACKFILE
+  GIT->>-WORK_GIT: 200 / blobs
+
+  WORK_GIT->>+STORAGE_API: LOAD NEW CONTENT
+  loop For each blob
+      STORAGE_API->>OBJSTORE: ADD BLOB
+  end
+  STORAGE_API-->>-WORK_GIT: 200 / blobs
+
+  WORK_GIT->>+STORAGE_API: NEW DIR
+  STORAGE_API->>STORAGE_DB: INSERT DIR
+  STORAGE_API-->>-WORK_GIT: 201
+
+  WORK_GIT->>+STORAGE_API: NEW REV
+  STORAGE_API->>STORAGE_DB: INSERT REV
+  STORAGE_API-->>-WORK_GIT: 201
+
+  WORK_GIT->>+STORAGE_API: NEW REL
+  STORAGE_API->>STORAGE_DB: INSERT REL
+  STORAGE_API-->>-WORK_GIT: 201
+
+  WORK_GIT->>+STORAGE_API: NEW SNAPSHOT
+  STORAGE_API->>STORAGE_DB: INSERT SNAPSHOT
+  STORAGE_API-->>-WORK_GIT: 201
+
+  WORK_GIT-->>-RMQ: SET CT2 status=eventful
+  activate RMQ
+  RMQ->>+SCH_LS: NOTIFY end of task CT2
+  deactivate RMQ
+  SCH_LS->>-SCH_DB: UPDATE T2 set state=end
diff --git a/docs/tasks-lister.mmd b/docs/tasks-lister.mmd
new file mode 100644
--- /dev/null
+++ b/docs/tasks-lister.mmd
@@ -0,0 +1,43 @@
+sequenceDiagram
+  participant WEB as swh-web
+  participant SCH_API as scheduler API
+  participant SCH_DB as scheduler DB
+  participant SCH_RUN as scheduler runner
+  participant RMQ as Rabbit-MQ
+  participant SCH_LS as scheduler listener
+  participant WORK_GITLAB as worker@gitlab-lister
+  participant GITLAB as gitlab API
+  participant STORAGE_API as storage API
+  participant STORAGE_DB as storage DB
+
+  Note over WEB,SCH_API: Save gitlab forge 0xdeadbeef
+  WEB->>+SCH_API: CREATE TASK lister-gitlab
+  SCH_API->>+SCH_DB: INSERT TASK
+  SCH_API-->>-WEB: 201
+  loop Polling
+      SCH_RUN->>SCH_DB: GET TASK set state=scheduled
+      SCH_DB-->>-SCH_RUN: TASK id=T1
+	  activate SCH_RUN
+      SCH_RUN->>RMQ: CREATE Celery Task CT1
+	  deactivate SCH_RUN
+	  activate RMQ
+  end
+
+  RMQ->>+WORK_GITLAB: Start task CT1
+  deactivate RMQ
+  WORK_GITLAB->>+GITLAB: Get git repos
+  GITLAB-->>-WORK_GITLAB: Known git repos
+  loop For Each Repo
+      WORK_GITLAB->>+STORAGE_API: CREATE ORIGIN
+      WORK_GITLAB->>+SCH_API: CREATE TASK loader-git
+	  SCH_API->>SCH_DB: INSERT TASK
+      SCH_API-->>-WORK_GITLAB: 201
+	  STORAGE_API->>STORAGE_DB: INSERT ORIGIN
+	  STORAGE_API-->>-WORK_GITLAB: 201
+  end
+
+  WORK_GITLAB-->>-RMQ: SET CT1 status=eventful
+  activate RMQ
+  RMQ->>+SCH_LS: NOTIFY end of task CT1
+  deactivate RMQ
+  SCH_LS->>-SCH_DB: UPDATE T1 set state=end
diff --git a/requirements.txt b/requirements.txt
--- a/requirements.txt
+++ b/requirements.txt
@@ -4,4 +4,5 @@
 vcversioner
 sphinx >= 1.3
 sphinxcontrib-httpdomain
+sphinxcontrib-mermaid
 recommonmark
diff --git a/swh/docs/sphinx/conf.py b/swh/docs/sphinx/conf.py
--- a/swh/docs/sphinx/conf.py
+++ b/swh/docs/sphinx/conf.py
@@ -19,7 +19,9 @@
               'sphinx.ext.napoleon',
               # 'sphinx.ext.intersphinx',
               'sphinxcontrib.httpdomain',
-              'sphinx.ext.extlinks']
+              'sphinx.ext.extlinks',
+              'sphinxcontrib.mermaid',
+              ]
 
 # Add any paths that contain templates here, relative to this directory.
 templates_path = ['_templates']
@@ -37,6 +39,12 @@
 # The master toctree document.
 master_doc = 'index'
 
+# A string of reStructuredText that will be included at the beginning of every
+# source file that is read.
+rst_prolog = '''
+.. include:: /swh_substitutions
+'''
+
 # The version info for the project you're documenting, acts as replacement for
 # |version| and |release|, also used in various other places throughout the
 # built documents.