diff --git a/docs/compression.rst b/docs/compression.rst
new file mode 100644
--- /dev/null
+++ b/docs/compression.rst
@@ -0,0 +1,74 @@
+Graph compression
+=================
+
+The compression process is based on the `WebGraph framework
+`_ and ecosystem libraries.
+
+References used:
+
+- Paolo Boldi, Sebastiano Vigna, *The webgraph framework I: compression
+ techniques*, Proceedings of the 13th international conference on World Wide
+ Web. ACM, 2004. `pdf `_
+- Paolo Boldi, Marco Rosa, Massimo Santini, Sebastiano Vigna, *Layered label
+ propagation: A multiresolution coordinate-free ordering for compressing social
+ networks*. `arxiv `_
+
+1. MPH
+------
+
+A node in the `Software Heritage graph
+`_ is
+identified using its string `persistent identifier
+`_.
+However, WebGraph internally uses integers to refer to node ids.
+
+Mapping between the strings and longs ids is needed before compressing the
+graph. From the `Sux4J `_ utility tool, we can use the
+`GOVMinimalPerfectHashFunction
+`_
+class, mapping with no collisions $n$ keys to $n$ consecutive integers.
+
+The step produces a ``.mph`` file storing the hash function taking as input a
+string and returning a unique integer.
+
+2. BVGraph
+----------
+
+This is the first actual compression step, building a compressed version of the
+input graph using WebGraph techniques presented in the framework paper.
+
+The graph is stored using multiple files: ``.graph``, ``.offsets``, ``.obl``,
+and ``.properties``.
+
+3. BFS
+-------
+
+In the LLP paper, authors propose an empirical analysis linking node ordering
+and high compression ratio: it is important to use an ordering of nodes ids such
+that vertices from the same host are close to one another.
+
+Building on this insight, the previous compression results in the BVGraph step
+are improved by re-ordering nodes ids using a BFS traversal order.
+
+The ordering is stored in the ``.order`` file, listing nodes ids in order of
+traversal.
+
+4. Permutation
+--------------
+
+Once the order is computed (BFS or another ordering technique), the final
+compressed graph is created based on the initial BVGraph result, and using the
+new node order mapping.
+
+Extra steps
+-----------
+
+5. Stats
+~~~~~~~~
+
+Various statistics on the final compressed graph are computed in ``.stats``.
+
+6. Transpose
+~~~~~~~~~~~~
+
+A transposed graph is created to allow backward traversal.