diff --git a/docs/_images/athena_tables.png b/docs/_images/athena_tables.png
deleted file mode 100644
index 94f67de..0000000
Binary files a/docs/_images/athena_tables.png and /dev/null differ
diff --git a/docs/athena.rst b/docs/athena.rst
deleted file mode 100644
index cf80875..0000000
--- a/docs/athena.rst
+++ /dev/null
@@ -1,115 +0,0 @@
-Setup on Amazon Athena
-======================
-
-The Software Heritage Graph Dataset is available as a public dataset in `Amazon
-Athena `_. Athena uses `presto
-`_, a distributed SQL query engine, to
-automatically scale queries on large datasets.
-
-The pricing of Athena depends on the amount of data scanned by each query,
-generally at a cost of $5 per TiB of data scanned. Full pricing details are
-available `here `_.
-
-Note that because the Software Heritage Graph Dataset is available as a public
-dataset, you **do not have to pay for the storage, only for the queries**
-(except for the data you store on S3 yourself, like query results).
-
-
-Loading the tables
-------------------
-
-.. highlight:: bash
-
-AWS account
-~~~~~~~~~~~
-
-In order to use Amazon Athena, you will first need to `create an AWS account
-and setup billing
-`_.
-
-
-Setup
-~~~~~
-
-Athena needs to be made aware of the location and the schema of the Parquet
-files available as a public dataset. Unfortunately, since Athena does not
-support queries that contain multiple commands, it is not as simple as pasting
-an installation script in the console. Instead, we provide a Python script that
-can be run locally on your machine, that will communicate with Athena to create
-the tables automatically with the appropriate schema.
-
-To run this script, you will need to install a few dependencies on your
-machine:
-
-- For **Ubuntu** and **Debian**::
-
- sudo apt install python3 python3-boto3 awscli
-
-- For **Archlinux**::
-
- sudo pacman -S --needed python python-boto3 aws-cli
-
-Once the dependencies are installed, run::
-
- aws configure
-
-This will ask for an AWS Access Key ID and an AWS Secret Access Key in
-order to give Python access to your AWS account. These keys can be generated at
-`this address
-`_.
-
-It will also ask for the region in which you want to run the queries. We
-recommand to use ``us-east-1``, since that's where the public dataset is
-located.
-
-Creating the tables
-~~~~~~~~~~~~~~~~~~~
-
-Download and run the Python script that will create the tables on your account:
-
-.. tabs::
-
- .. group-tab:: full
-
- ::
-
- wget https://annex.softwareheritage.org/public/dataset/graph/latest/athena/tables.py
- wget https://annex.softwareheritage.org/public/dataset/graph/latest/athena/gen_schema.py
- ./gen_schema.py
-
- .. group-tab:: popular-4k
-
- This dataset is not available on Athena yet.
-
- .. group-tab:: popular-3k-python
-
- This dataset is not available on Athena yet.
-
-To check that the tables have been successfully created in your account, you
-can open your `Amazon Athena console
-`_. You should be able to select
-the database corresponding to your dataset, and see the tables:
-
-.. image:: _images/athena_tables.png
-
-
-Running queries
----------------
-
-.. highlight:: sql
-
-From the console, once you have selected the database of your dataset, you can
-run SQL queries directly from the Query Editor.
-
-Try for instance this query that computes the most frequent file names in the
-archive::
-
- SELECT from_utf8(name, '?') AS name, COUNT(DISTINCT target) AS cnt
- FROM directory_entry_file
- GROUP BY name
- ORDER BY cnt DESC
- LIMIT 10;
-
-Other examples are available in the preprint of our article: `The Software
-Heritage Graph Dataset: Public software development under one roof.
-`_
diff --git a/docs/datasets.rst b/docs/datasets.rst
deleted file mode 100644
index dcd5cd1..0000000
--- a/docs/datasets.rst
+++ /dev/null
@@ -1,87 +0,0 @@
-Dataset
-=======
-
-We provide the full graph dataset along with two "teaser" datasets that can be
-used for trying out smaller-scale experiments before using the full graph.
-
-The main URLs of the datasets are relative to our dataset prefix:
-`https://annex.softwareheritage.org/public/dataset/ `__
-
-
-Main dataset
-------------
-
-The main dataset contains the full Software Heritage Graph. It is available
-in the following formats:
-
-- **PostgreSQL (compressed)**:
-
- - **URL**: `/graph/latest/sql/
- `_
- - **Total size**: 1.2 TiB
-
-- **Apache Parquet**:
-
- - **URL**: `/graph/latest/parquet/
- `_
- - **Total size**: 1.2 TiB
-
-Teaser datasets
----------------
-
-popular-4k
-~~~~~~~~~~
-
-The ``popular-4k`` teaser contains a subset of 4000 popular
-repositories from GitHub, Gitlab, PyPI and Debian. The selection criteria to
-pick the software origins was the following:
-
-- The 1000 most popular GitHub projects (by number of stars)
-- The 1000 most popular Gitlab projects (by number of stars)
-- The 1000 most popular PyPI projects (by usage statistics, according to the
- `Top PyPI Packages `_ database),
-- The 1000 most popular Debian packages (by "votes" according to the `Debian
- Popularity Contest `_ database)
-
-This teaser is available in the following formats:
-
-- **PostgreSQL (compressed)**:
-
- - **URL**: `/graph/latest/popular-4k/sql/
- `_
- - **Total size**: TODO
-
-- **Apache Parquet**:
-
- - **URL**: `/graph/latest/popular-4k/parquet/
- `_
- - **Total size**: TODO
-
-popular-3k-python
-~~~~~~~~~~~~~~~~~
-
-The ``popular-3k-python`` teaser contains a subset of 3052 popular
-repositories **tagged as being written in the Python language**, from GitHub,
-Gitlab, PyPI and Debian. The selection criteria to pick the software origins
-was the following, similar to ``popular-4k``:
-
-- the 1000 most popular GitHub projects written in Python (by number of stars),
-- the 131 Gitlab projects written in Python that have 2 stars or more,
-- the 1000 most popular PyPI projects (by usage statistics, according to the
- `Top PyPI Packages `_ database),
-- the 1000 most popular Debian packages with the
- `debtag `_ ``implemented-in::python`` (by
- "votes" according to the `Debian Popularity Contest
- `_ database).
-
-- **PostgreSQL (compressed)**:
-
- - **URL**: `/graph/latest/popular-3k-python/sql/
- `_
- - **Total size**: TODO
-
-- **Apache Parquet**:
-
- - **URL**: `/graph/latest/popular-3k-python/sql/
- `_
- - **Total size**: TODO
diff --git a/docs/index.rst b/docs/index.rst
index 0e99243..f251325 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -1,53 +1,11 @@
.. _swh-dataset:
-Software Heritage Graph Dataset
-===============================
+Software Heritage Datasets
+==========================
-This is the Software Heritage graph dataset: a fully-deduplicated Merkle
-DAG representation of the Software Heritage archive. The dataset links
-together file content identifiers, source code directories, Version
-Control System (VCS) commits tracking evolution over time, up to the
-full states of VCS repositories as observed by Software Heritage during
-periodic crawls. The dataset’s contents come from major development
-forges (including `GitHub `__ and
-`GitLab `__), FOSS distributions (e.g.,
-`Debian `__), and language-specific package managers (e.g.,
-`PyPI `__). Crawling information is also included,
-providing timestamps about when and where all archived source code
-artifacts have been observed in the wild.
+This page lists the different public datasets and periodic data dumps of the
+archive published by Software Heritage.
-The Software Heritage graph dataset is available in multiple formats,
-including downloadable CSV dumps and Apache Parquet files for local use,
-as well as a public instance on Amazon Athena interactive query service
-for ready-to-use powerful analytical processing.
-
-By accessing the dataset, you agree with the Software Heritage `Ethical
-Charter for using the archive
-data `__,
-and the `terms of use for bulk
-access `__.
-
-
-If you use this dataset for research purposes, please cite the following paper:
-
-*
- | Antoine Pietri, Diomidis Spinellis, Stefano Zacchiroli.
- | *The Software Heritage Graph Dataset: Public software development under one roof.*
- | In proceedings of `MSR 2019 `_: The 16th International Conference on Mining Software Repositories, May 2019, Montreal, Canada. Co-located with `ICSE 2019 `_.
- | `preprint `_, `bibtex `_
-
-.. toctree::
- :maxdepth: 2
- :caption: Contents:
-
- datasets
- postgresql
- athena
-
-
-Indices and tables
-==================
-
-* :ref:`genindex`
-* :ref:`modindex`
-* :ref:`search`
+:ref:`The Software Heritage Graph Dataset `
+ the entire graph of Software Heritage in a fully-deduplicated Merkle DAG
+ representation.
diff --git a/docs/postgresql.rst b/docs/postgresql.rst
deleted file mode 100644
index b3e7556..0000000
--- a/docs/postgresql.rst
+++ /dev/null
@@ -1,98 +0,0 @@
-Setup on a PostgreSQL instance
-==============================
-
-This tutorial will guide you through the steps required to setup the Software
-Heritage Graph Dataset in a PostgreSQL database.
-
-.. highlight:: bash
-
-PostgreSQL local setup
-----------------------
-
-You need to have access to a running PostgreSQL instance to load the dataset.
-This section contains information on how to setup PostgreSQL for the first
-time.
-
-*If you already have a PostgreSQL server running on your machine, you can skip
-to the next section.*
-
-- For **Ubuntu** and **Debian**::
-
- sudo apt install postgresql
-
-- For **Archlinux**::
-
- sudo pacman -S --needed postgresql
- sudo -u postgres initdb -D '/var/lib/postgres/data'
- sudo systemctl enable --now postgresql
-
-Once PostgreSQL is running, you also need an user that will be able to create
-databases and run queries. The easiest way to achieve that is simply to create
-an account that has the same name as your username and that can create
-databases::
-
- sudo -u postgres createuser --createdb $USER
-
-
-Retrieving the dataset
-----------------------
-
-You need to download the dataset in SQL format. Use the following command on
-your machine, after making sure that it has enough available space for the
-dataset you chose:
-
-.. tabs::
-
- .. group-tab:: full
-
- ::
-
- mkdir full && cd full
- wget -c -A gz,sql -nd -r -np -nH https://annex.softwareheritage.org/public/dataset/graph/2019-01-28/sql/
-
- .. group-tab:: popular-4k
-
- ::
-
- mkdir full && cd full
- wget -c -A gz,sql -nd -r -np -nH https://annex.softwareheritage.org/public/dataset/graph/2019-01-28/popular-4k/sql/
-
- .. group-tab:: popular-3k-python
-
- ::
-
- mkdir full && cd full
- wget -c -A gz,sql -nd -r -np -nH https://annex.softwareheritage.org/public/dataset/graph/2019-01-28/popular-3k-python/sql/
-
-Loading the dataset
--------------------
-
-Once you have retrieved the dataset of your choice, create a database that will
-contain it, and load the database:
-
-.. tabs::
-
- .. group-tab:: full
-
- ::
-
- createdb swhgd
- psql swhgd < swh_import.sql
-
- .. group-tab:: popular-4k
-
- ::
-
- createdb swhgd-popular-4k
- psql swhgd-popular-4k < swh_import.sql
-
- .. group-tab:: popular-3k-python
-
- ::
-
- createdb swhgd-popular-3k-python
- psql swhgd-popular-3k-python < swh_import.sql
-
-
-You can now run SQL queries on your database. Run ``psql `` to
-start an interactive PostgreSQL console.