diff --git a/docs/graph/dataset.rst b/docs/graph/dataset.rst
--- a/docs/graph/dataset.rst
+++ b/docs/graph/dataset.rst
@@ -1,40 +1,201 @@
Dataset
=======
-We provide the full graph dataset along with two "teaser" datasets that can be
-used for trying out smaller-scale experiments before using the full graph.
+We aim to provide regular exports of the Software Heritage graph in two
+different formats:
+
+- **Columnar data storage**: a set of relational tables stored in a columnar
+ format such as `Apache ORC `_, which is particularly
+ suited for scale-out analyses on data lakes and big data processing
+ ecosystems such as the Hadoop environment.
+
+- **Compressed graph**: a compact and highly-efficient representation of the
+ graph dataset, suited for scale-up analysis on high-end machines with large
+ amounts of memory. The graph is compressed in *Boldi-Vigna representation*,
+ designed to be loaded by the `WebGraph framework
+ `_, specifically using our `swh-graph
+ library `_.
+
+
+Summary of dataset versions
+---------------------------
+
+**Full graph**:
+
+.. list-table::
+ :header-rows: 1
+
+ * - Name
+ - # Nodes
+ - # Edges
+ - Columnar
+ - Compressed
+
+ * - `2021-03-23`_
+ - 20,667,308,808
+ - 232,748,148,441
+ - ✔
+ - ✔
+
+ * - `2020-12-15`_
+ - 19,330,739,526
+ - 213,848,749,638
+ - ✗
+ - ✔
+
+ * - `2020-05-20`_
+ - 17,075,708,289
+ - 203,351,589,619
+ - ✗
+ - ✔
+
+ * - `2019-01-28`_
+ - 11,683,687,950
+ - 159,578,271,511
+ - ✔
+ - ✔
+
+
+**Teaser datasets**:
+
+.. list-table::
+ :header-rows: 1
+
+ * - Name
+ - # Nodes
+ - # Edges
+ - Columnar
+ - Compressed
+
+ * - `2020-12-15-gitlab-all`_
+ - 1,083,011,764
+ - 27,919,670,049
+ - ✗
+ - ✔
+
+ * - `2020-12-15-gitlab-100k`_
+ - 304,037,235
+ - 9,516,984,175
+ - ✗
+ - ✔
+
+ * - `2019-01-28-popular-4k`_
+ - ?
+ - ?
+ - ✔
+ - ✗
+
+ * - `2019-01-28-popular-3k-python`_
+ - 27,363,226
+ - 346,413,337
+ - ✔
+ - ✔
+
+Full graph datasets
+-------------------
+
+
+2021-03-23
+~~~~~~~~~~
-All the main URLs are relative to our dataset prefix:
-`https://annex.softwareheritage.org/public/dataset/ `__.
+A full export of the graph dated from March 2021.
-The Software Heritage Graph Dataset contains a table representation of the full
-Software Heritage Graph. It is available in the following formats:
+- **Columnar tables (Apache ORC)**:
-- **PostgreSQL (compressed)**:
+ - **Total size**: 8.4 TiB
+ - **URL**: `/graph/2021-03-23/orc/
+ `_
+ - **S3**: ``s3://softwareheritage/graph/2021-03-23/orc``
- - **Total size**: 1.2 TiB
- - **URL**: `/graph/latest/sql/
- `_
+- **Compressed graph**:
+
+ - **URL**: `/graph/2021-03-23/compressed/
+ `_
+
+
+2020-12-15
+~~~~~~~~~~
+
+A full export of the graph dated from December 2020. Only available in
+compressed representation.
+
+- **Compressed graph**:
+
+ - **URL**: `/graph/2020-12-15/compressed/
+ `_
+
+
+2020-05-20
+~~~~~~~~~~
+
+
+A full export of the graph dated from May 2020. Only available in
+compressed representation.
+**(DEPRECATED: known issue with missing snapshot edges.)**
+
+- **Compressed graph**:
+
+ - **URL**: `/graph/2020-05-20/compressed/
+ `_
+
+
+2019-01-28
+~~~~~~~~~~
+
+A full export of the graph dated from January 2019. The export was done in two
+phases, one of them called "2018-09-25" and the other "2019-01-28". They both
+refer to the same dataset, but the different formats have various
+inconsistencies between them.
+**(DEPRECATED: early export pipeline, various inconsistencies).**
-- **Apache Parquet**:
+- **Columnar tables (Apache Parquet)**:
- **Total size**: 1.2 TiB
- - **URL**: `/graph/latest/parquet/
- `_
- - **S3**: ``s3://softwareheritage/graph``
+ - **URL**: `/graph/2019-01-28/parquet/
+ `_
+ - **S3**: ``s3://softwareheritage/graph/2018-09-25/parquet``
+
+- **Compressed graph**:
+
+ - **URL**: `/graph/2019-01-28/compressed/
+ `_
+
Teaser datasets
---------------
-If the above dataset is too big, we also provide the following "teaser"
+If the above datasets are too big, we also provide "teaser"
datasets that can get you started and have a smaller size fingerprint.
-popular-4k
-~~~~~~~~~~
+2020-12-15-gitlab-all
+~~~~~~~~~~~~~~~~~~~~~
+
+A teaser dataset containing the entirety of Gitlab, exported in December 2020.
+Available in compressed graph format.
+
+- **Compressed graph**:
+
+ - **URL**: `/graph/2020-12-15-gitlab-all/compressed/
+ `_
+
+2020-12-15-gitlab-100k
+~~~~~~~~~~~~~~~~~~~~~~
+
+A teaser dataset containing the 100k most popular Gitlab repositories,
+exported in December 2020. Available in compressed graph format.
+
+- **Compressed graph**:
-The ``popular-4k`` teaser contains a subset of 4000 popular
-repositories from GitHub, Gitlab, PyPI and Debian. The selection criteria to
-pick the software origins was the following:
+ - **URL**: `/graph/2020-12-15-gitlab-100k/compressed/
+ `_
+
+
+2019-01-28-popular-4k
+~~~~~~~~~~~~~~~~~~~~~
+
+This teaser dataset contains a subset of 4000 popular repositories from GitHub,
+Gitlab, PyPI and Debian. The selection criteria to pick the software origins
+was the following:
- The 1000 most popular GitHub projects (by number of stars)
- The 1000 most popular Gitlab projects (by number of stars)
@@ -43,23 +204,15 @@
- The 1000 most popular Debian packages (by "votes" according to the `Debian
Popularity Contest `_ database)
-This teaser is available in the following formats:
-
-- **PostgreSQL (compressed)**:
-
- - **Total size**: 23 GiB
- - **URL**: `/graph/latest/popular-4k/sql/
- `_
-
-- **Apache Parquet**:
+- **Columnar (Apache Parquet)**:
- **Total size**: 27 GiB
- - **URL**: `/graph/latest/popular-4k/parquet/
- `_
- - **S3**: ``s3://softwareheritage/teasers/popular-4k``
+ - **URL**: `/graph/2019-01-28-popular-4k/parquet/
+ `_
+ - **S3**: ``s3://softwareheritage/graph/2019-01-28-popular-4k/parquet/``
-popular-3k-python
-~~~~~~~~~~~~~~~~~
+2019-01-28-popular-3k-python
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The ``popular-3k-python`` teaser contains a subset of 3052 popular
repositories **tagged as being written in the Python language**, from GitHub,
@@ -75,15 +228,9 @@
"votes" according to the `Debian Popularity Contest
`_ database).
-- **PostgreSQL (compressed)**:
-
- - **Total size**: 4.7 GiB
- - **URL**: `/graph/latest/popular-3k-python/sql/
- `_
-
-- **Apache Parquet**:
+- **Columnar (Apache Parquet)**:
- **Total size**: 5.3 GiB
- - **URL**: `/graph/latest/popular-3k-python/parquet/
- `_
- - **S3**: ``s3://softwareheritage/teasers/popular-4k``
+ - **URL**: `/graph/2019-01-28-popular-3k-python/parquet/
+ `_
+ - **S3**: ``s3://softwareheritage/graph/2019-01-28-popular-3k-python/parquet/``
diff --git a/docs/graph/index.rst b/docs/graph/index.rst
--- a/docs/graph/index.rst
+++ b/docs/graph/index.rst
@@ -17,9 +17,9 @@
artifacts have been observed in the wild.
The Software Heritage graph dataset is available in multiple formats,
-including downloadable CSV dumps and Apache Parquet files for local use,
-as well as a public instance on Amazon Athena interactive query service
-for ready-to-use powerful analytical processing.
+including relational Apache ORC files for local use, as well as a public
+instance on Amazon Athena interactive query service for ready-to-use powerful
+analytical processing.
By accessing the dataset, you agree with the Software Heritage `Ethical
Charter for using the archive