diff --git a/common/images/webgraphs_compression.pdf b/common/images/webgraphs_compression.pdf
new file mode 100644
index 0000000..90f21ac
Binary files /dev/null and b/common/images/webgraphs_compression.pdf differ
diff --git a/common/modules/graph-compression.org b/common/modules/graph-compression.org
new file mode 100644
index 0000000..e551fd9
--- /dev/null
+++ b/common/modules/graph-compression.org
@@ -0,0 +1,193 @@
+#+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt)
+#+INCLUDE: "prelude.org" :minlevel 1
+
+# Depends: \usepackage{pdfpages}
+
+* Graph Compression
+  :PROPERTIES:
+  :CUSTOM_ID: main
+  :END:
+** Graph compression on the Software Heritage archive
+*** 
+     #+BEGIN_EXPORT latex
+     \vspace{-3mm}
+     \begin{thebibliography}{Foo Bar, 1969}
+     \bibitem{Boldi2020} Paolo Boldi, Antoine Pietri, Sebastiano Vigna, Stefano Zacchiroli
+     \newblock Ultra-Large-Scale Repository Analysis via Graph Compression
+     \newblock SANER 2020, 27th Intl. Conf. on Software Analysis, Evolution and Reengineering. IEEE
+     \end{thebibliography}
+     #+END_EXPORT
+*** Research question
+    Is it possible to efficiently perform software development history analyses
+    at ultra large scale (= the scale of Software Heritage archive or more), on
+    a single, relatively cheap machine?
+*** Idea
+    Apply state-of-the-art graph compression techniques from the field of Web
+    graph / social network analysis.
+
+** Background --- Web graph compression
+   Borrowing (great!) slides from:
+   #+BEGIN_EXPORT latex
+   \begin{thebibliography}{}
+   \bibitem{Pibiri2018} Giulio Ermanno Pibiri
+   \newblock Effective Web Graph Representations, 2018
+   \newblock \url{http://pages.di.unipi.it/pibiri/slides/webgraphs\_compression.pdf}
+   \end{thebibliography}
+   #+END_EXPORT
+
+** Background -- Web graph compression (imported slides)
+   :PROPERTIES:
+   :BEAMER_env: ignoreheading
+   :END:
+   #+BEGIN_EXPORT latex
+   {
+   \setbeamercolor{background canvas}{bg=}
+   \setbeamertemplate{background}{}
+   \includepdf[pages={4,11,12,13}]{webgraphs_compression.pdf}
+   \addtocounter{framenumber}{4}
+   }
+   #+END_EXPORT
+
+** Corpus
+*** Nodes
+    :PROPERTIES:
+    :BEAMER_env: block
+    :BEAMER_col: 0.5
+    :END:
+    | *Node type* | *N. of nodes* |
+    |-------------+---------------|
+    | origins     | 88 M          |
+    | snapshots   | 57 M          |
+    | releases    | 9.9 M         |
+    | revisions   | 1.1 B         |
+    | directories | 4.9 B         |
+    | contents    | 5.5 B         |
+    |-------------+---------------|
+    | Total nodes | 12 B          |
+*** Edges
+    :PROPERTIES:
+    :BEAMER_env: block
+    :BEAMER_col: 0.5
+    :END:
+    | *Edge type*           | *N. of edges* |
+    |-----------------------+---------------|
+    | origin → snapshot     | 195 M         |
+    | snapshot → revision   | 616 M         |
+    | snapshot → release    | 215 M         |
+    | release → revision    | 9.9 M         |
+    | revision → revision   | 1.2 B         |
+    | revision → directory  | 1.1 B         |
+    | directory → directory | 48 B          |
+    | directory → revisiony | 482 M         |
+    | directory → content   | 112 B         |
+    |-----------------------+---------------|
+    | Total edges           | 165 B         |
+*** 
+    Archive snapshot 2018-09-25, from the Software Heritage graph dataset.\\
+    Growth rate: exponential, doubling every 22-30 months.
+
+** Graph compression pipeline
+    #+BEAMER: \hspace*{-0.1\linewidth} \includegraphics[width=1.2\linewidth]{compression/compression-steps}
+    #+BEAMER: \vspace{-1cm}
+*** 
+    - *MPH*: minimal perfect hash, mapping Merkle IDs to 0..N-1 integers
+    - *BV* compress: Boldi-Vigna compression (based on MPH order)
+    - *BFS*: breadth-first visit to renumber
+    - *Permute*: update BV compression according to BFS order
+*** (Re)establishing locality
+    - key for good compression is a node ordering that ensures locality and
+      similarity
+    - which is very much /not/ the case with Merkle IDs...
+    - ...but is the case /again/ after BFS
+
+** Compression experiment
+   | *Step*      | *Wall time* (hours) |
+   |-------------+---------------------|
+   | MPH         |                   2 |
+   | BV Compress |                  84 |
+   | BFS         |                  19 |
+   | Permute     |                  18 |
+   | Transpose   |                  15 |
+   |-------------+---------------------|
+   | Total       |        138 (6 days) |
+
+   - server equipped with 24 CPUs and 750 GB of RAM
+   - RAM mostly used as I/O cache for the BFS step
+   - /minimum/ memory requirements are close to the RAM needed to load the
+     final compressed graph in memory
+
+** Compression efficiency (space)
+*** 
+    :PROPERTIES:
+    :BEAMER_env: block
+    :BEAMER_col: 0.4
+    :END:
+    | *Forward  graph*  |        |
+    | total size        | 91 GiB |
+    | bits per edge     |   4.91 |
+    | compression ratio |  15.8% |
+*** 
+    :PROPERTIES:
+    :BEAMER_env: block
+    :BEAMER_col: 0.4
+    :END:
+    | *Backward  graph* |        |
+    | total size        | 83 GiB |
+    | bits per edge     |   4.49 |
+    | compression ratio |  14.4% |
+*** Operation cost
+    The structure of a full bidirectional archive graph fits in less than 200
+    GiB of RAM, for a hardware cost of ~300 USD.
+
+** Compression efficiency (time)
+*** Benchmark --- Full BFS visit
+    #+BEAMER: \begin{columns}\begin{column}{0.45\textwidth}
+    | *Forward  graph* |                |
+    |------------------+----------------|
+    | wall time        | 1h48m          |
+    | throughput       | 1.81 M nodes/s |
+    |                  | (553 ns/node)  |
+    #+BEAMER: \end{column}\begin{column}{0.45\textwidth}
+    | *Backward graph* |                |
+    |------------------+----------------|
+    | wall time        | 3h17m          |
+    | throughput       | 988 M nodes/s  |
+    |                  | (1.01 µs/node) |
+    #+BEAMER: \end{column}\end{columns}
+*** Benchmark --- Edge lookup
+    random sample: 1 B nodes (8.3% of entire graph)
+    #+BEAMER: \begin{columns}\begin{column}{0.45\textwidth}
+    | *Forward  graph* |                |
+    |------------------+----------------|
+    | visited edges    | 13.6 B         |
+    | throughput       | 12.0 M edges/s |
+    |                  | (83 ns/edge)   |
+    #+BEAMER: \end{column}\begin{column}{0.45\textwidth}
+    | *Backward graph* |                |
+    |------------------+----------------|
+    | visited edges    | 13.6 B         |
+    | throughput       | 9.45 M edges/s |
+    |                  | (106 ns/edge)  |
+    #+BEAMER: \end{column}\end{columns}
+*** 
+    :PROPERTIES:
+    :BEAMER_env: ignoreheading
+    :END:
+    Note how edge lookup time is close to DRAM random access time (50-60 ns).
+
+** Discussion
+*** Incrementality
+    compression is *not incremental*, due to the use of contiguous integer
+    ranges
+    - but the graph is append-only, so...
+    - ...based on expected graph growth rate it should be possible to
+      pre-allocate enough free space in the integer ranges to support
+      *amortized incrementality* (future work)
+  #+BEAMER: \pause
+*** In-memory v. on-disk
+    the compressed in-memory graph structure has *no attributes*
+    - usual design is to exploit the 0..N-1 integer ranges to *memory map node
+      attributes* to disk for efficient access
+    - works well for queries that does graph traversal first and "join" node
+      attributes last; ping-pong between the two is expensive
+    - edge attributes are more problematic