diff --git a/docs/graph/index.rst b/docs/graph/index.rst
index bba72f9..033b99d 100644
--- a/docs/graph/index.rst
+++ b/docs/graph/index.rst
@@ -1,56 +1,55 @@
.. _swh-graph-dataset:
Software Heritage Graph Dataset
===============================
This is the Software Heritage graph dataset: a fully-deduplicated Merkle
DAG representation of the Software Heritage archive. The dataset links
together file content identifiers, source code directories, Version
Control System (VCS) commits tracking evolution over time, up to the
full states of VCS repositories as observed by Software Heritage during
periodic crawls. The dataset’s contents come from major development
forges (including `GitHub `__ and
`GitLab `__), FOSS distributions (e.g.,
`Debian `__), and language-specific package managers (e.g.,
`PyPI `__). Crawling information is also included,
providing timestamps about when and where all archived source code
artifacts have been observed in the wild.
The Software Heritage graph dataset is available in multiple formats,
including relational Apache ORC files for local use, as well as a public
instance on Amazon Athena interactive query service for ready-to-use powerful
analytical processing.
By accessing the dataset, you agree with the Software Heritage `Ethical
Charter for using the archive
data `__,
and the `terms of use for bulk
access `__.
If you use this dataset for research purposes, please cite the following paper:
*
| Antoine Pietri, Diomidis Spinellis, Stefano Zacchiroli.
| *The Software Heritage Graph Dataset: Public software development under one roof.*
| In proceedings of `MSR 2019 `_: The 16th International Conference on Mining Software Repositories, May 2019, Montreal, Canada. Co-located with `ICSE 2019 `_.
| `preprint `_, `bibtex `_
.. toctree::
:maxdepth: 2
:caption: Contents:
:titlesonly:
dataset
schema
- postgresql
athena
databricks
Indices and tables
------------------
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
diff --git a/docs/graph/postgresql.rst b/docs/graph/postgresql.rst
deleted file mode 100644
index 5d8c17e..0000000
--- a/docs/graph/postgresql.rst
+++ /dev/null
@@ -1,98 +0,0 @@
-Setup on a PostgreSQL instance
-==============================
-
-This tutorial will guide you through the steps required to setup the Software
-Heritage Graph Dataset in a PostgreSQL database.
-
-.. highlight:: bash
-
-PostgreSQL local setup
-----------------------
-
-You need to have access to a running PostgreSQL instance to load the dataset.
-This section contains information on how to setup PostgreSQL for the first
-time.
-
-*If you already have a PostgreSQL server running on your machine, you can skip
-to the next section.*
-
-- For **Ubuntu** and **Debian**::
-
- sudo apt install postgresql
-
-- For **Archlinux**::
-
- sudo pacman -S --needed postgresql
- sudo -u postgres initdb -D '/var/lib/postgres/data'
- sudo systemctl enable --now postgresql
-
-Once PostgreSQL is running, you also need an user that will be able to create
-databases and run queries. The easiest way to achieve that is simply to create
-an account that has the same name as your username and that can create
-databases::
-
- sudo -u postgres createuser --createdb $USER
-
-
-Retrieving the dataset
-----------------------
-
-You need to download the dataset in SQL format. Use the following command on
-your machine, after making sure that it has enough available space for the
-dataset you chose:
-
-.. tabs::
-
- .. group-tab:: full
-
- ::
-
- mkdir swhgd && cd swhgd
- wget -c -q --show-progress -A gz,sql -nd -r -np -nH https://annex.softwareheritage.org/public/dataset/graph/latest/sql/
-
- .. group-tab:: teaser: popular-4k
-
- ::
-
- mkdir popular-4k && cd popular-4k
- wget -c -q --show-progress -A gz,sql -nd -r -np -nH https://annex.softwareheritage.org/public/dataset/graph/latest/popular-4k/sql/
-
- .. group-tab:: teaser: popular-3k-python
-
- ::
-
- mkdir popular-3k-python && cd popular-3k-python
- wget -c -q --show-progress -A gz,sql -nd -r -np -nH https://annex.softwareheritage.org/public/dataset/graph/latest/popular-3k-python/sql/
-
-Loading the dataset
--------------------
-
-Once you have retrieved the dataset of your choice, create a database that will
-contain it, and load the database:
-
-.. tabs::
-
- .. group-tab:: full
-
- ::
-
- createdb swhgd
- psql swhgd < load.sql
-
- .. group-tab:: teaser: popular-4k
-
- ::
-
- createdb swhgd-popular-4k
- psql swhgd-popular-4k < load.sql
-
- .. group-tab:: teaser: popular-3k-python
-
- ::
-
- createdb swhgd-popular-3k-python
- psql swhgd-popular-3k-python < load.sql
-
-
-You can now run SQL queries on your database. Run ``psql `` to
-start an interactive PostgreSQL console.