diff --git a/docs/extrinsic-metadata-specification.rst b/docs/extrinsic-metadata-specification.rst
new file mode 100644
--- /dev/null
+++ b/docs/extrinsic-metadata-specification.rst
@@ -0,0 +1,116 @@
+.. _extrinsic-metadata-specification:
+
+Extrinsic metadata specification
+================================
+
+:term:`Extrinsic metadata` is information about software that is not part
+of the source code itself but still closely related to the software.
+Usually it is available on the web view of a repository's forge and its API
+or an external registry.
+
+Since they are not part of the source code, we need a separate mechanism
+to fetch and store them.
+
+This specification assumes the reader is familiar with Software Heritage's
+:ref:`architecture` and :ref:`data-model`.
+
+
+Metadata providers
+------------------
+
+Definition
+~~~~~~~~~~
+
+We define five types of metadata providers:
+
+* :term:`loaders <loader>`, which are the components dedicated to fetching
+  the source-code from origins (VCS repositories, distribution packages,
+  ...). They may either discover metadata as a side-effect of loading
+  source code, or be dedicated to fetching metadata.
+
+* :term:`listers <lister>`, which are the components of SWH dedicated to
+  discovering origins on known websites/forges; and may discover
+  metadata as a side-effect
+
+* :term:`deposit clients <deposit>`, which push metadata to SWH from a
+  third-party; usually at the same time as a :term:`software artifact`
+
+* gatherers, which fetch metadata from an authoritative source of the
+  repository (eg. its website or forge) in a way that is none of the three
+  above (eg. by querying a specific API of the origin's forge).
+
+* registries, which fetch data from non-authoritative databases, meaning
+  they are not directly referenced to by the origin's website/forge/...
+  (eg. Wikidata)
+
+A provider is uniquely defined by these two properties:
+
+* its name, representing the software/database from which metadata is
+  extracted (eg. `gitlab`, `wikidata`, `hal`); each provider name
+  matches a component of SWH, dedicated to getting data from it.
+
+* its URL, which unambiguously identifies an instance of the provider.
+
+Example providers:
+
+=============== =============== =================================
+type            name            url
+=============== =============== =================================
+deposit_client  hal             https://hal.archives-ouvertes.fr/
+deposit_client  swh             https://www.softwareheritage.org/
+lister          gitlab_lister   https://gitlab.com/
+loader          gitlab_loader   https://gitlab.com/
+registry        wikidata        https://www.wikidata.org/
+=============== =============== =================================
+
+Storage API
+~~~~~~~~~~~
+
+The :term:`storage` API offers two endpoints to manipulate metadata
+providers: 
+
+* `metadata_provider_add(name, url, type, metadata)`
+  which adds a new metadata provider to the storage.
+
+* `metadata_provider_get_by(name, url)`
+  which looks up for a known provider (there is at most one) and if it is
+  known, returns a dictionary with keys `name`, `url`, `type`, and `metadata`.
+
+`metadata` is an arbitrary JSON-encodable dictionary with informations
+about the provider, in a format specific to each provider name.
+This field only uses for future uses; currently it should always be empty.
+
+Origin metadata storage
+-----------------------
+
+Extrinsic metadata are stored in SWH's :term:`storage database`, alongside
+the :term:`Merkle DAG` containing all known software artifacts.
+The storage API offers three endpoints to manipulate origin metadata:
+
+* `origin_metadata_add(origin_id, discovery_date, provider_name, provider_url, metadata)`
+  which adds a new `metadata` dictionary obtained from a given provider
+  and associated to the origin.
+  The provider must be known to the storage before using this endpoint.
+
+* `origin_metadata_get(origin_id, provider_name, provider_url, after, limit)`
+  which returns a list of dictionaries:
+  `{'provider': {...}, 'discovery_date': ..., 'metadata': {...}}`,
+  one for each metadata item deposited, corresponding to the given origin
+  and obtained from the specified provider
+
+* `origin_metadata_get_by_provider_type(origin_id, provider_type, after, limit)`
+  which works similarly to `origin_metadata_get`, but returns results for
+  all providers of a given type.
+
+The parameters `after` and `limit` are used for pagination based on the
+order defined by the `discovery_date`.
+
+All of the results of `origin_metadata_get` and
+`origin_metadata_get_by_provider_type` can be considered authoritative
+for the given origin at the given `discovery_date`, unless the provider type
+is `registry`.
+
+The format of `metadata` is a JSON-encodable dictionary. Its format is
+specific to each provider; and is treated as an opaque value by the storage.
+Unifying these various formats into a common language is outside the scope
+of this specification.
diff --git a/docs/index.rst b/docs/index.rst
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -30,6 +30,11 @@
 
 * :ref:`archive-copies`
 
+Specifications
+--------------
+
+* :ref:`extrinsic-metadata-specification`
+
 
 Reference Documentation
 -----------------------