diff --git a/docs/compression.rst b/docs/compression.rst
new file mode 100644
--- /dev/null
+++ b/docs/compression.rst
@@ -0,0 +1,106 @@
+Graph compression
+=================
+
+The compression process is based on the `WebGraph framework
+`_ and ecosystem libraries.
+
+References used:
+
+- Paolo Boldi, Sebastiano Vigna, *The webgraph framework I: compression
+ techniques*, Proceedings of the 13th international conference on World Wide
+ Web. ACM, 2004. `pdf `_
+- Paolo Boldi, Marco Rosa, Massimo Santini, Sebastiano Vigna, *Layered label
+ propagation: A multiresolution coordinate-free ordering for compressing social
+ networks*. `arxiv `_
+- Alberto Apostolico, Guido Drovandi, *Graph compression by BFS*, Algorithms
+ 2009, 2(3), 1031-1044. `mdpi `_
+
+.. figure:: images/compression_steps.png
+ :alt: Compression steps
+
+ Compression steps
+
+1. MPH
+------
+
+A node in the `Software Heritage graph
+`_ is
+identified using its string `persistent identifier
+`_.
+However, WebGraph internally uses integers to refer to node ids.
+
+Mapping between the strings and longs ids is needed before compressing the
+graph. From the `Sux4J `_ utility tool, we use the
+`GOVMinimalPerfectHashFunction
+`_
+class, mapping with no collisions $n$ keys to $n$ consecutive integers.
+
+The step produces a ``.mph`` file (MPH stands for *Minimal Perfect
+Hash-function*) storing the hash function taking as input a string and returning
+a unique integer.
+
+2. BV compress
+--------------
+
+This is the first actual compression step, building a compressed version of the
+input graph using WebGraph techniques presented in the framework paper. We use
+the `ScatteredArcsASCIIGraph
+`_
+class, from WebGraph.
+
+The resulting BV graph is stored as a set of files:
+
+- ``.graph``: the compressed graph in the BV format
+- ``.offsets``: offsets values to read the bit stream graph file
+- ``.obl``: offsets cache to load the graph faster
+- ``.properties``: entries used to correctly decode graph and offset files
+
+3. BFS
+-------
+
+In the LLP paper, authors propose an empirical analysis linking node ordering
+and high compression ratio: it is important to use an ordering of nodes ids such
+that vertices from the same host are close to one another.
+
+Building on this insight, the previous compression results in the BV compress
+step are improved by re-ordering nodes ids using a BFS traversal order. We use
+the `BFSBig <>`_ class from the `LAW `_ library.
+
+The resulting ordering is stored in the ``.order`` file, listing nodes ids in
+order of traversal.
+
+4. Permute
+----------
+
+Once the order is computed (BFS or another ordering technique), the final
+compressed graph is created based on the initial BV compress result, and using
+the new node order mapping. The permutation uses the `Transform
+`_
+class from WebGraph framework.
+
+The final compressed graph is only stored in the resulting ``.graph``,
+``.offsets``, ``.obl``, and ``.properties`` files.
+
+Extra steps
+-----------
+
+5. Stats
+~~~~~~~~
+
+Compute various statistics on the final compressed graph:
+
+- ``.stats``: entries such as number of nodes, edges, avg/min/max degree,
+ average locality, etc.
+- ``.indegree``: graph indegree distribution
+- ``.outdegree``: graph outdegree distribution
+
+This step uses the `Stats
+`_
+class from WebGraph.
+
+6. Transpose
+~~~~~~~~~~~~
+
+Create a transposed graph to allow backward traversal, using the `Transform
+`_
+class from WebGraph.
diff --git a/docs/images/.gitignore b/docs/images/.gitignore
new file mode 100644
--- /dev/null
+++ b/docs/images/.gitignore
@@ -0,0 +1,2 @@
+compression_steps.png
+compression_steps.svg
diff --git a/docs/images/Makefile b/docs/images/Makefile
new file mode 100644
--- /dev/null
+++ b/docs/images/Makefile
@@ -0,0 +1,13 @@
+all: compression_steps.png compression_steps.svg
+
+%.png: %.dot
+ dot -Gdpi=300 -Tpng $< -o $@
+
+%.svg: %.dot
+ dot -Tsvg $< -o $@
+
+.PHONY: clean
+
+clean:
+ rm -f compression_steps.png
+ rm -f compression_steps.svg
diff --git a/docs/images/compression_steps.dot b/docs/images/compression_steps.dot
new file mode 100644
--- /dev/null
+++ b/docs/images/compression_steps.dot
@@ -0,0 +1,51 @@
+digraph "Compression steps" {
+ // Horizontal graph
+ rankdir=LR;
+
+ subgraph {
+ input_edges [label="swh.edges.csv.gz", fontsize=9, shape=none];
+ input_nodes [label="swh.nodes.csv.gz", fontsize=9, shape=none];
+ {rank=same; input_edges; input_nodes;}
+ }
+
+ mph [label="MPH", shape=box];
+ mph_out [label="swh.mph", fontsize=9, shape=none];
+
+ bv_compress [label="BV compress", shape=box];
+ bv_compress_out
+ [label="swh-bv.graph\lswh-bv.offsets\lswh-bv.obl\lswh-bv.properties",
+ fontsize=9, shape=none];
+
+ bfs [label="BFS", shape=box];
+ bfs_out [label="swh.order", fontsize=9, shape=none];
+
+ permute [label="Permute", shape=box];
+ permute_out
+ [label="swh.graph\lswh.offsets\lswh.obl\lswh.properties",
+ fontsize=9, shape=none];
+
+ stats [label="Stats", shape=box];
+ stats_out
+ [label="swh.stats\lswh.indegree\lswh.outdegree",
+ fontsize=9, shape=none];
+
+ transpose [label="Transpose", shape=box];
+ transpose_out
+ [label="swh-transposed.graph\lswh-transposed.offsets\lswh-transposed.obl\lswh-transposed.properties",
+ fontsize=9, shape=none];
+
+ input_nodes -> mph;
+ input_edges -> bv_compress;
+ mph -> mph_out;
+ mph_out -> bv_compress;
+ bv_compress -> bv_compress_out;
+ bv_compress_out-> bfs;
+ bv_compress_out-> permute;
+ bfs -> bfs_out;
+ bfs_out -> permute;
+ permute -> permute_out;
+ permute_out -> stats;
+ permute_out -> transpose;
+ stats -> stats_out;
+ transpose -> transpose_out;
+}