diff --git a/docs/graph/_images/athena_tables.png b/docs/graph/_images/athena_tables.png
new file mode 100644
index 0000000..94f67de
Binary files /dev/null and b/docs/graph/_images/athena_tables.png differ
diff --git a/docs/graph/athena.rst b/docs/graph/athena.rst
new file mode 100644
index 0000000..15e20f2
--- /dev/null
+++ b/docs/graph/athena.rst
@@ -0,0 +1,115 @@
+Setup on Amazon Athena
+======================
+
+The Software Heritage Graph Dataset is available as a public dataset in `Amazon
+Athena `_. Athena uses `presto
+`_, a distributed SQL query engine, to
+automatically scale queries on large datasets.
+
+The pricing of Athena depends on the amount of data scanned by each query,
+generally at a cost of $5 per TiB of data scanned. Full pricing details are
+available `here `_.
+
+Note that because the Software Heritage Graph Dataset is available as a public
+dataset, you **do not have to pay for the storage, only for the queries**
+(except for the data you store on S3 yourself, like query results).
+
+
+Loading the tables
+------------------
+
+.. highlight:: bash
+
+AWS account
+~~~~~~~~~~~
+
+In order to use Amazon Athena, you will first need to `create an AWS account
+and setup billing
+`_.
+
+
+Setup
+~~~~~
+
+Athena needs to be made aware of the location and the schema of the Parquet
+files available as a public dataset. Unfortunately, since Athena does not
+support queries that contain multiple commands, it is not as simple as pasting
+an installation script in the console. Instead, we provide a Python script that
+can be run locally on your machine, that will communicate with Athena to create
+the tables automatically with the appropriate schema.
+
+To run this script, you will need to install a few dependencies on your
+machine:
+
+- For **Ubuntu** and **Debian**::
+
+ sudo apt install python3 python3-boto3 awscli
+
+- For **Archlinux**::
+
+ sudo pacman -S --needed python python-boto3 aws-cli
+
+Once the dependencies are installed, run::
+
+ aws configure
+
+This will ask for an AWS Access Key ID and an AWS Secret Access Key in
+order to give Python access to your AWS account. These keys can be generated at
+`this address
+`_.
+
+It will also ask for the region in which you want to run the queries. We
+recommand to use ``us-east-1``, since that's where the public dataset is
+located.
+
+Creating the tables
+~~~~~~~~~~~~~~~~~~~
+
+Download and run the Python script that will create the tables on your account:
+
+.. tabs::
+
+ .. group-tab:: full
+
+ ::
+
+ wget https://annex.softwareheritage.org/public/dataset/graph/latest/athena/tables.py
+ wget https://annex.softwareheritage.org/public/dataset/graph/latest/athena/gen_schema.py
+ ./gen_schema.py
+
+ .. group-tab:: teaser: popular-4k
+
+ This teaser is not available on Athena yet.
+
+ .. group-tab:: teaser: popular-3k-python
+
+ This teaser is not available on Athena yet.
+
+To check that the tables have been successfully created in your account, you
+can open your `Amazon Athena console
+`_. You should be able to select
+the database corresponding to your dataset, and see the tables:
+
+.. image:: _images/athena_tables.png
+
+
+Running queries
+---------------
+
+.. highlight:: sql
+
+From the console, once you have selected the database of your dataset, you can
+run SQL queries directly from the Query Editor.
+
+Try for instance this query that computes the most frequent file names in the
+archive::
+
+ SELECT from_utf8(name, '?') AS name, COUNT(DISTINCT target) AS cnt
+ FROM directory_entry_file
+ GROUP BY name
+ ORDER BY cnt DESC
+ LIMIT 10;
+
+Other examples are available in the preprint of our article: `The Software
+Heritage Graph Dataset: Public software development under one roof.
+`_
diff --git a/docs/graph/datasets.rst b/docs/graph/datasets.rst
new file mode 100644
index 0000000..95cb56f
--- /dev/null
+++ b/docs/graph/datasets.rst
@@ -0,0 +1,83 @@
+Dataset
+=======
+
+We provide the full graph dataset along with two "teaser" datasets that can be
+used for trying out smaller-scale experiments before using the full graph.
+
+All the main URLs are relative to our dataset prefix:
+`https://annex.softwareheritage.org/public/dataset/ `__.
+
+The Software Heritage Graph Dataset contains a table representation of the full
+Software Heritage Graph. It is available in the following formats:
+
+- **PostgreSQL (compressed)**:
+
+ - **URL**: `/graph/latest/sql/
+ `_
+ - **Total size**: 1.2 TiB
+
+- **Apache Parquet**:
+
+ - **URL**: `/graph/latest/parquet/
+ `_
+ - **Total size**: 1.2 TiB
+
+Teaser datasets
+---------------
+
+popular-4k
+~~~~~~~~~~
+
+The ``popular-4k`` teaser contains a subset of 4000 popular
+repositories from GitHub, Gitlab, PyPI and Debian. The selection criteria to
+pick the software origins was the following:
+
+- The 1000 most popular GitHub projects (by number of stars)
+- The 1000 most popular Gitlab projects (by number of stars)
+- The 1000 most popular PyPI projects (by usage statistics, according to the
+ `Top PyPI Packages `_ database),
+- The 1000 most popular Debian packages (by "votes" according to the `Debian
+ Popularity Contest `_ database)
+
+This teaser is available in the following formats:
+
+- **PostgreSQL (compressed)**:
+
+ - **URL**: `/graph/latest/popular-4k/sql/
+ `_
+ - **Total size**: TODO
+
+- **Apache Parquet**:
+
+ - **URL**: `/graph/latest/popular-4k/parquet/
+ `_
+ - **Total size**: TODO
+
+popular-3k-python
+~~~~~~~~~~~~~~~~~
+
+The ``popular-3k-python`` teaser contains a subset of 3052 popular
+repositories **tagged as being written in the Python language**, from GitHub,
+Gitlab, PyPI and Debian. The selection criteria to pick the software origins
+was the following, similar to ``popular-4k``:
+
+- the 1000 most popular GitHub projects written in Python (by number of stars),
+- the 131 Gitlab projects written in Python that have 2 stars or more,
+- the 1000 most popular PyPI projects (by usage statistics, according to the
+ `Top PyPI Packages `_ database),
+- the 1000 most popular Debian packages with the
+ `debtag `_ ``implemented-in::python`` (by
+ "votes" according to the `Debian Popularity Contest
+ `_ database).
+
+- **PostgreSQL (compressed)**:
+
+ - **URL**: `/graph/latest/popular-3k-python/sql/
+ `_
+ - **Total size**: TODO
+
+- **Apache Parquet**:
+
+ - **URL**: `/graph/latest/popular-3k-python/sql/
+ `_
+ - **Total size**: TODO
diff --git a/docs/graph/index.rst b/docs/graph/index.rst
new file mode 100644
index 0000000..c44e806
--- /dev/null
+++ b/docs/graph/index.rst
@@ -0,0 +1,53 @@
+.. _swh-graph-dataset:
+
+Software Heritage Graph Dataset
+===============================
+
+This is the Software Heritage graph dataset: a fully-deduplicated Merkle
+DAG representation of the Software Heritage archive. The dataset links
+together file content identifiers, source code directories, Version
+Control System (VCS) commits tracking evolution over time, up to the
+full states of VCS repositories as observed by Software Heritage during
+periodic crawls. The dataset’s contents come from major development
+forges (including `GitHub `__ and
+`GitLab `__), FOSS distributions (e.g.,
+`Debian `__), and language-specific package managers (e.g.,
+`PyPI `__). Crawling information is also included,
+providing timestamps about when and where all archived source code
+artifacts have been observed in the wild.
+
+The Software Heritage graph dataset is available in multiple formats,
+including downloadable CSV dumps and Apache Parquet files for local use,
+as well as a public instance on Amazon Athena interactive query service
+for ready-to-use powerful analytical processing.
+
+By accessing the dataset, you agree with the Software Heritage `Ethical
+Charter for using the archive
+data `__,
+and the `terms of use for bulk
+access `__.
+
+
+If you use this dataset for research purposes, please cite the following paper:
+
+*
+ | Antoine Pietri, Diomidis Spinellis, Stefano Zacchiroli.
+ | *The Software Heritage Graph Dataset: Public software development under one roof.*
+ | In proceedings of `MSR 2019 `_: The 16th International Conference on Mining Software Repositories, May 2019, Montreal, Canada. Co-located with `ICSE 2019 `_.
+ | `preprint `_, `bibtex `_
+
+.. toctree::
+ :maxdepth: 2
+ :caption: Contents:
+
+ datasets
+ postgresql
+ athena
+
+
+Indices and tables
+==================
+
+* :ref:`genindex`
+* :ref:`modindex`
+* :ref:`search`
diff --git a/docs/graph/postgresql.rst b/docs/graph/postgresql.rst
new file mode 100644
index 0000000..02ab33f
--- /dev/null
+++ b/docs/graph/postgresql.rst
@@ -0,0 +1,98 @@
+Setup on a PostgreSQL instance
+==============================
+
+This tutorial will guide you through the steps required to setup the Software
+Heritage Graph Dataset in a PostgreSQL database.
+
+.. highlight:: bash
+
+PostgreSQL local setup
+----------------------
+
+You need to have access to a running PostgreSQL instance to load the dataset.
+This section contains information on how to setup PostgreSQL for the first
+time.
+
+*If you already have a PostgreSQL server running on your machine, you can skip
+to the next section.*
+
+- For **Ubuntu** and **Debian**::
+
+ sudo apt install postgresql
+
+- For **Archlinux**::
+
+ sudo pacman -S --needed postgresql
+ sudo -u postgres initdb -D '/var/lib/postgres/data'
+ sudo systemctl enable --now postgresql
+
+Once PostgreSQL is running, you also need an user that will be able to create
+databases and run queries. The easiest way to achieve that is simply to create
+an account that has the same name as your username and that can create
+databases::
+
+ sudo -u postgres createuser --createdb $USER
+
+
+Retrieving the dataset
+----------------------
+
+You need to download the dataset in SQL format. Use the following command on
+your machine, after making sure that it has enough available space for the
+dataset you chose:
+
+.. tabs::
+
+ .. group-tab:: full
+
+ ::
+
+ mkdir full && cd full
+ wget -c -A gz,sql -nd -r -np -nH https://annex.softwareheritage.org/public/dataset/graph/2019-01-28/sql/
+
+ .. group-tab:: teaser: popular-4k
+
+ ::
+
+ mkdir full && cd full
+ wget -c -A gz,sql -nd -r -np -nH https://annex.softwareheritage.org/public/dataset/graph/2019-01-28/popular-4k/sql/
+
+ .. group-tab:: teaser: popular-3k-python
+
+ ::
+
+ mkdir full && cd full
+ wget -c -A gz,sql -nd -r -np -nH https://annex.softwareheritage.org/public/dataset/graph/2019-01-28/popular-3k-python/sql/
+
+Loading the dataset
+-------------------
+
+Once you have retrieved the dataset of your choice, create a database that will
+contain it, and load the database:
+
+.. tabs::
+
+ .. group-tab:: full
+
+ ::
+
+ createdb swhgd
+ psql swhgd < swh_import.sql
+
+ .. group-tab:: teaser: popular-4k
+
+ ::
+
+ createdb swhgd-popular-4k
+ psql swhgd-popular-4k < swh_import.sql
+
+ .. group-tab:: teaser: popular-3k-python
+
+ ::
+
+ createdb swhgd-popular-3k-python
+ psql swhgd-popular-3k-python < swh_import.sql
+
+
+You can now run SQL queries on your database. Run ``psql `` to
+start an interactive PostgreSQL console.