diff --git a/docs/graph/athena.rst b/docs/graph/athena.rst
index 25ee257..188334e 100644
--- a/docs/graph/athena.rst
+++ b/docs/graph/athena.rst
@@ -1,125 +1,125 @@
 Setup on Amazon Athena
 ======================
 
 The Software Heritage Graph Dataset is available as a public dataset in `Amazon
 Athena <https://aws.amazon.com/athena/>`_. Athena uses `presto
 <https://prestodb.github.io/>`_, a distributed SQL query engine, to
 automatically scale queries on large datasets.
 
 The pricing of Athena depends on the amount of data scanned by each query,
 generally at a cost of $5 per TiB of data scanned. Full pricing details are
 available `here <https://aws.amazon.com/athena/pricing/>`_.
 
 Note that because the Software Heritage Graph Dataset is available as a public
 dataset, you **do not have to pay for the storage, only for the queries**
 (except for the data you store on S3 yourself, like query results).
 
 
 Loading the tables
 ------------------
 
 .. highlight:: bash
 
 AWS account
 ~~~~~~~~~~~
 
 In order to use Amazon Athena, you will first need to `create an AWS account
 and setup billing
 <https://aws.amazon.com/premiumsupport/knowledge-center/create-and-activate-aws-account/>`_.
 
 You will also need to create an **output S3 bucket**: this is the place where
 Athena will store your query results, so that you can retrieve them and analyze
 them afterwards.  To do that, go on the `S3 console
 <https://s3.console.aws.amazon.com/s3/home>`_ and create a new bucket.
 
 
 Setup
 ~~~~~
 
 Athena needs to be made aware of the location and the schema of the Parquet
 files available as a public dataset. Unfortunately, since Athena does not
 support queries that contain multiple commands, it is not as simple as pasting
 an installation script in the console. Instead, we provide a Python script that
 can be run locally on your machine, that will communicate with Athena to create
 the tables automatically with the appropriate schema.
 
 To run this script, you will need to install a few dependencies on your
 machine:
 
 - For **Ubuntu** and **Debian**::
 
     sudo apt install python3 python3-boto3 awscli
 
 - For **Archlinux**::
 
     sudo pacman -S --needed python python-boto3 aws-cli
 
 Once the dependencies are installed, run::
 
   aws configure
 
 This will ask for an AWS Access Key ID and an AWS Secret Access Key in
 order to give Python access to your AWS account. These keys can be generated at
 `this address
 <https://console.aws.amazon.com/iam/home#/security_credentials>`_.
 
 It will also ask for the region in which you want to run the queries. We
-recommand to use ``us-east-1``, since that's where the public dataset is
+recommend to use ``us-east-1``, since that's where the public dataset is
 located.
 
 Creating the tables
 ~~~~~~~~~~~~~~~~~~~
 
 Download and run the Python script that will create the tables on your account:
 
 .. tabs::
 
   .. group-tab:: full
 
     ::
 
       wget https://annex.softwareheritage.org/public/dataset/graph/latest/athena/athena.py
       python3 athena.py -o 's3://YOUR_OUTPUT_BUCKET/'
 
   .. group-tab:: teaser: popular-4k
 
     ::
 
       wget https://annex.softwareheritage.org/public/dataset/graph/latest/athena/athena.py
       python3 athena.py -o 's3://YOUR_OUTPUT_BUCKET/' -d popular4k -l 's3://softwareheritage/teasers/popular-4k'
 
   .. group-tab:: teaser: popular-3k-python
 
     ::
 
       wget https://annex.softwareheritage.org/public/dataset/graph/latest/athena/athena.py
       python3 athena.py -o 's3://YOUR_OUTPUT_BUCKET/' -d popular3kpython -l 's3://softwareheritage/teasers/popular-3k-python'
 
 To check that the tables have been successfully created in your account, you
 can open your `Amazon Athena console
 <https://console.aws.amazon.com/athena/home>`_. You should be able to select
 the database corresponding to your dataset, and see the tables:
 
 .. image:: _images/athena_tables.png
 
 
 Running queries
 ---------------
 
 .. highlight:: sql
 
 From the console, once you have selected the database of your dataset, you can
 run SQL queries directly from the Query Editor.
 
 Try for instance this query that computes the most frequent file names in the
 archive::
 
   SELECT from_utf8(name, '?') AS name, COUNT(DISTINCT target) AS cnt
   FROM directory_entry_file
   GROUP BY name
   ORDER BY cnt DESC
   LIMIT 10;
 
 Other examples are available in the preprint of our article: `The Software
 Heritage Graph Dataset: Public software development under one roof.
 <https://upsilon.cc/~zack/research/publications/msr-2019-swh.pdf>`_
diff --git a/docs/graph/index.rst b/docs/graph/index.rst
index 991414f..58b96ef 100644
--- a/docs/graph/index.rst
+++ b/docs/graph/index.rst
@@ -1,56 +1,56 @@
 .. _swh-graph-dataset:
 
 Software Heritage Graph Dataset
 ===============================
 
 This is the Software Heritage graph dataset: a fully-deduplicated Merkle
 DAG representation of the Software Heritage archive. The dataset links
 together file content identifiers, source code directories, Version
 Control System (VCS) commits tracking evolution over time, up to the
 full states of VCS repositories as observed by Software Heritage during
 periodic crawls. The dataset’s contents come from major development
 forges (including `GitHub <https://github.com/>`__ and
 `GitLab <https://gitlab.com>`__), FOSS distributions (e.g.,
 `Debian <debian.org>`__), and language-specific package managers (e.g.,
 `PyPI <https://pypi.org/>`__). Crawling information is also included,
 providing timestamps about when and where all archived source code
 artifacts have been observed in the wild.
 
 The Software Heritage graph dataset is available in multiple formats,
 including downloadable CSV dumps and Apache Parquet files for local use,
 as well as a public instance on Amazon Athena interactive query service
 for ready-to-use powerful analytical processing.
 
 By accessing the dataset, you agree with the Software Heritage `Ethical
 Charter for using the archive
 data <https://www.softwareheritage.org/legal/users-ethical-charter/>`__,
 and the `terms of use for bulk
 access <https://www.softwareheritage.org/legal/bulk-access-terms-of-use/>`__.
 
 
 If you use this dataset for research purposes, please cite the following paper:
 
-* 
+*
     | Antoine Pietri, Diomidis Spinellis, Stefano Zacchiroli.
     | *The Software Heritage Graph Dataset: Public software development under one roof.*
     | In proceedings of `MSR 2019 <http://2019.msrconf.org/>`_: The 16th International Conference on Mining Software Repositories, May 2019, Montreal, Canada. Co-located with `ICSE 2019 <https://2019.icse-conferences.org/>`_.
     | `preprint <https://upsilon.cc/~zack/research/publications/msr-2019-swh.pdf>`_, `bibtex <https://upsilon.cc/~zack/research/publications/msr-2019-swh.bib>`_
 
 .. toctree::
    :maxdepth: 2
    :caption: Contents:
    :titlesonly:
 
    dataset
    schema
    postgresql
    athena
    databricks
 
 
 Indices and tables
 ------------------
 
 * :ref:`genindex`
 * :ref:`modindex`
 * :ref:`search`
diff --git a/docs/graph/schema.rst b/docs/graph/schema.rst
index e2518f6..13409b7 100644
--- a/docs/graph/schema.rst
+++ b/docs/graph/schema.rst
@@ -1,142 +1,142 @@
 Relational schema
 =================
 
 The Merkle DAG of the Software Heritage archive is encoded in the dataset as a
 set of relational tables.
 A simplified view of the corresponding database schema is shown here:
 
 .. image:: _images/db-schema.svg
 
 This page documents the details of the schema.
 
 -  **content**: contains information on the contents stored in
    the archive.
 
   - ``sha1`` (bytes): the SHA-1 of the content
   - ``sha1_git`` (bytes): the Git SHA-1 of the content
   - ``length`` (integer): the length of the content
 
 -  **skipped_content**: contains information on the contents that were not archived for
   various reasons.
 
   - ``sha1`` (bytes): the SHA-1 of the missing content
   - ``sha1_git`` (bytes): the Git SHA-1 of the missing content
   - ``length`` (integer): the length of the missing content
 
 - **directory**: contains the directories stored in the archive.
 
   - ``id`` (bytes): the intrinsic identifier of the directory, recursively
     computed with the Git SHA-1 algorithm
   - ``dir_entries`` (array of integers): the list of directories contained in
     this directory, as references to an entry in the ``directory_entry_dir``
     table.
   - ``file_entries`` (array of integers): the list of files contained in
     this directory, as references to an entry in the ``directory_entry_file``
     table.
   - ``rev_entries`` (array of integers): the list of revisions contained in
     this directory, as references to an entry in the ``directory_entry_rev``
     table.
 
-- **directory_entry_file**: contains informations about file entries in
+- **directory_entry_file**: contains information about file entries in
   directories.
 
   - ``id`` (integer): unique identifier for the entry
   - ``target`` (bytes): the Git SHA-1 of the content this entry points to
   - ``name`` (bytes): the name of the file (basename of its path)
   - ``perms`` (integer): the permissions of the file
 
-- **directory_entry_dir**: contains informations about directory entries in
+- **directory_entry_dir**: contains information about directory entries in
   directories.
 
   - ``id`` (integer): unique identifier for the entry
   - ``target`` (bytes): the Git SHA-1 of the directory this entry points to
   - ``name`` (bytes): the name of the directory
   - ``perms`` (integer): the permissions of the directory
 
-- **directory_entry_rev**: contains informations about revision entries in
+- **directory_entry_rev**: contains information about revision entries in
   directories.
 
   - ``id`` (integer): unique identifier for the entry
   - ``target`` (bytes): the Git SHA-1 of the revision this entry points to
   - ``name`` (bytes): the name of the directory that contains this revision
   - ``perms`` (integer): the permissions of the revision
 
 - **person**: deduplicates commit authors by their names and e-mail addresses.
   For pseudonymization purposes and in order to prevent abuse, these columns
   were removed from the dataset, and this table only contains the ID of the
   author. Individual authors may be retrieved using this ID from the Software
   Heritage api.
 
   - ``id`` (integer): the identifier of the person
 
 - **revision**: contains the revisions stored in the archive.
 
   - ``id`` (bytes): the intrinsic identifier of the revision, recursively
     computed with the Git SHA-1 algorithm. For Git repositories, this
     corresponds to the revision hash.
   - ``date`` (timestamp): the date the revision was authored
   - ``committer_date`` (timestamp): the date the revision was committed
   - ``author`` (integer): the author of the revision
   - ``committer`` (integer): the committer of the revision
   - ``message`` (bytes): the revision message
   - ``directory`` (bytes): the Git SHA-1 of the directory the revision points
     to. Every revision points to the root directory of the project source
     tree to which it corresponds.
 
 - **revision_history**: contains the ordered set of parents of each revision.
   Each revision has an ordered set of parents (0 for the initial commit of a
   repository, 1 for a regular commit, 2 for a regular merge commit and 3 or
   more for octopus-style merge commits).
 
   - ``id`` (bytes): the Git SHA-1 identifier of the revision
   - ``parent_id`` (bytes): the Git SHA-1 identifier of the parent
   - ``parent_rank`` (integer): the rank of the parent which defines the total
     order of the parents of the revision
 
 - **release**: contains the releases stored in the archive.
 
   - ``id`` (bytes): the intrinsic identifier of the release, recursively
     computed with the Git SHA-1 algorithm.
   - ``target`` (bytes): the Git SHA-1 of the object the release points to.
   - ``date`` (timestamp): the date the release was created
   - ``author`` (integer): the author of the revision
   - ``name`` (bytes): the release name
   - ``message`` (bytes): the release message
 
 - **snapshot**: contains the list of snapshots stored in the archive.
 
   - ``id`` (bytes): the intrinsic identifier of the snapshot, recursively
     computed with the Git SHA-1 algorithm.
   - ``object_id`` (integer): the primary key of the snapshot
 
 - **snapshot_branches**: contains the identifiers of branches associated with
   each snapshot. This is an intermediary table through which is represented the
   many-to-many relationship between snapshots and branches.
 
   - ``snapshot_id`` (integer): the integer identifier of the snapshot
   - ``branch_id`` (integer): the identifier of the branch
 
 - **snapshot_branch**: contains the list of branches.
 
   - ``object_id`` (integer): the identifier of the branch
   - ``name`` (bytes): the name of the branch
   - ``target`` (bytes): the Git SHA-1 of the object the branch points to
   - ``target_type`` (string): the type of object the branch points to (either
     ``release``, ``revision``, ``directory`` or ``content``).
 
 - **origin**: the software origins from which the projects in the dataset were
   archived.
 
   - ``id`` (integer): the identifier of the origin
   - ``url`` (bytes): the URL of the origin
   - ``type`` (string): the type of origin (e.g ``git``, ``pypi``, ``hg``,
     ``svn``, ``git``, ``ftp``, ``deb``, ...)
 
 - **origin_visit**: the different visits of each origin. Since Software
   Heritage archives software continuously, software origins are crawled more
   than once. Each of these "visits" is an entry in this table.
 
   - ``origin``: (integer) the identifier of the origin visited
   - ``date``: (timestamp) the date at which the origin was visited
   - ``snapshot_id`` (integer): the integer identifier of the snapshot archived
     in this visit.