diff --git a/reports/node_mapping/Makefile b/reports/node_mapping/Makefile
new file mode 100644
index 0000000..c9dfccd
--- /dev/null
+++ b/reports/node_mapping/Makefile
@@ -0,0 +1,10 @@
+all: node_mapping.pdf
+
+node_mapping.pdf: node_mapping.tex
+	latexmk -xelatex -shell-escape -pdf $<
+
+.PHONY: clean
+
+clean:
+	rm -rf _minted*
+	latexmk -C
diff --git a/reports/node_mapping/node_mapping.tex b/reports/node_mapping/node_mapping.tex
new file mode 100644
index 0000000..ab831e8
--- /dev/null
+++ b/reports/node_mapping/node_mapping.tex
@@ -0,0 +1,155 @@
+\documentclass[11pt,a4paper]{article}
+
+\usepackage[a4paper,left=2cm,right=2cm,top=2.5cm,bottom=2.5cm]{geometry}
+\usepackage[english]{babel}
+\usepackage{booktabs}
+\usepackage{graphicx}
+\usepackage{hyperref}
+\usepackage{minted}
+\usepackage{parskip}
+\usepackage{siunitx}
+\usepackage{tikz}
+
+\title{Google Summer of Code 2019}
+\author{Thibault Allançon}
+\date{June 6, 2019}
+
+\newcommand{\mmap}{\mintinline{text}{mmap}}
+
+\begin{document}
+
+\maketitle
+
+Software Heritage refers to its graph nodes using string persistent
+identifiers\footnote{\url{https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html}}.
+During the compression, WebGraph maps those labels to long identifiers. We need
+an efficient way to have a bidirectional map between the labels and internal
+node ids.
+
+\section{Additional informations}
+
+Example of a SWH string identifier:
+
+\begin{minted}{text}
+    swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2
+\end{minted}
+
+The first step of the compression process is to map the input strings to long
+identifiers using the \mintinline{text}{GOVMinimalPerfectHashFunction} from the
+Sux4J\footnote{\url{http://sux4j.di.unimi.it/}} framework. The
+\mintinline{text}{.mph} function generated is a \textbf{minimal perfect hash
+function} (mapping with no collisions $n$ keys to $n$ consecutive integers).
+
+Later during the compression process, the nodes are re-ordered in a certain way
+(to be precise: in the order of a BFS traversal), so the MPH mapping is
+shuffled.
+
+\begin{figure}[H]
+    \centering
+    \begin{tikzpicture}[auto, node distance=6cm]
+        \node[label=below:{(string)}] (input) {\mintinline{text}{.nodes.csv}};
+        \node[right of=input, label=below:{(long)}] (mph) {\mintinline{text}{.mph}};
+        \node[right of=mph, label=below:{(long)}] (bfs) {\mintinline{text}{.order}};
+
+        \draw[->] (input) edge node{MPH function} (mph);
+        \draw[->] (mph) edge node{BFS ordering} (bfs);
+    \end{tikzpicture}
+    \caption{End-to-end node mapping}
+\end{figure}
+
+We want to create a bidirectional map to retrieve efficiently the corresponding
+strings or longs. Using naive hash maps is not possible due to memory usage (we
+have around 20B nodes), hence we must use a disk-based solution.
+
+\section{Solution 1: \mmap{}}
+
+Idea: dump the two mappings into huge files and load them using \mmap{} system
+call to avoid storing them into memory.
+
+Map from long to string only needs to store the SWH id because the MPH is a
+perfect mapping, so we can store on line $i$ the SWH persistent identifier of
+node $i$, and simply read the $i$-th line to retrieve it.
+
+Map from string to long needs to store both the SWH id and the node id on the
+same line. To efficiently get the corresponding node id we can store the lines
+ordered by SWH id, and binary search to get the mapping.
+
+The main problem with this solution is technical since Java has a maximum size
+on \mmap{}-ed files of ~2GB. To overcome the limitation and \mmap{} very large
+files, there is the \mintinline{text}{ByteBufferInputStream} class from the
+dsiutils\footnote{\url{http://dsiutils.di.unimi.it/}} package. But this only
+applies to read from files, not to write to them. I had to create the
+\mintinline{text}{ByteBufferOutputStream} to allow writing on huge \mmap{}-ed
+files.
+
+Note: I first started with the \mmap{} solution and got early results that
+allowed me to compare with the DB-based solution, before going full in with the
+\mmap{} one and coding the \mintinline{text}{ByteBufferOutputStream}.
+
+\section{Solution 2: Databases}
+
+Another approach would be to simply use a disk-based database, two were
+considered in my tests: RocksDB, and HaloDB. In both cases, I tried optimizing
+the database options to fit our needs and get the best timings.
+
+On the technical side, the code needed is simpler and more generic than the
+custom \mmap{} solution.  However, this can only be a feasible solution if map
+retrieval/building time overhead are small enough to scale to the entire graph.
+
+\subsection{RocksDB}
+
+RocksDB is a popular and mature persistent key-value database developed by
+Facebook. It is coded in C++ but provides a Java API.
+
+The code used to dump and load the mappings can be found in
+\mintinline{text}{NodeIdMapRocksDB.java} file.
+
+RocksDB has \textbf{many} parameters to tune (see their tuning
+guide\footnote{\url{https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide}}),
+I tried the most obvious ones but being completely new to this software, the
+benchmarks are not be the best possible.
+
+\subsection{HaloDB}
+
+HaloDB is a relatively new database scheme developed by Yahoo. It got my
+attention because it is written in Java and leaves out features of RocksDB we
+don't need (like range scans) in order to have better performance.
+
+The code used to dump and load the mappings can be found in
+\mintinline{text}{NodeIdMapHaloDB.java} file.
+
+\newpage
+
+\section{Results}
+
+Experiments were done on a VM with 700GB of RAM and 72 vCPUs using the
+\mintinline{text}{rev_to_rev} dataset (1B nodes and 1B edges).
+
+\begin{center}
+    \begin{tabular}{@{} l *3r @{}}
+        \toprule
+        \multicolumn{1}{c}{} &
+            \textbf{\mmap{}} & \textbf{RocksDB} &
+            \textbf{HaloDB} \\
+        \midrule
+        \texttt{dump}
+            & 1h24min & estimated 4h & estimated 8h \\
+        \texttt{load}
+            & 0.003s & & \\
+        \texttt{get node id}
+            & ~900µs & & \\
+        \texttt{get swh id}
+            & ~50µs & & \\
+        \texttt{RAM usage (dump)}
+            & 130GB & & \\
+        \texttt{RAM usage (load)}
+            & 5GB & & \\
+        \texttt{Disk space}
+            & 108GB & & \\
+        \bottomrule
+    \end{tabular}
+\end{center}
+
+Note: \texttt{RAM usage} includes in-memory compressed graph.
+
+\end{document}