diff --git a/docs/use-cases.rst b/docs/use-cases.rst
new file mode 100644
--- /dev/null
+++ b/docs/use-cases.rst
@@ -0,0 +1,162 @@
+This document lists use cases and benchmark scenarios for the Software Heritage
+graph service.
+
+
+Conventions
+===========
+
+- **Node identification**: in the following, nodes are always identified by
+  their Software Heritage :ref:`persistent-identifiers` (SWH PIDs).
+
+
+Use cases
+=========
+
+
+Browsing
+--------
+
+The following use cases require traversing the *forward graph*.
+
+- **ls**: given a directory node, list (non recursively) all linked nodes of
+  type directory and content
+
+  Implementation::
+
+    /graph/neighbors/:NODE_ID?edges=dir:cnt,dir:dir
+
+- **ls -R**: given a directory node, recursively list all linked nodes of type
+  directory and content
+
+  Implementation::
+
+    /graph/visit/paths/:NODE_ID?edges=dir:cnt,dir:dir
+
+- **git log**: given a revision node, recursively list all linked nodes of type
+  revision
+
+  Implementation::
+
+    /graph/visit/nodes/:NODE_ID?edges=rev:rev
+
+
+Vault
+-----
+
+The following use cases require traversing the *forward graph*.
+
+- **tarball** (same as *ls -R* above)
+
+- **git bundle**: given a node, recursively list all linked nodes of any kind
+
+  Implementation::
+
+     /graph/visit/nodes/:NODE_ID?edges=*
+
+
+Provenance
+----------
+
+The following use cases require traversing the *backward (transposed)
+graph*.
+
+- **commit provenance**: given a content or directory node, return *a* commit
+  whose directory (recursively) contains it
+
+  Implementation::
+
+    /graph/walk/:NODE_ID/rev?direction=backward
+
+- **complete commit provenance**: given a content or directory node, return
+  *all* commits whose directory (recursively) contains it
+
+  Implementation::
+
+    /graph/leaves/:NODE_ID?direction=backward&edges=dir:dir,cnt:dir,dir:rev
+
+- **origin provenance**: given a content, directory, or commit node, return
+  *an* origin that has at least one snapshot that (recursively) contains it
+
+  Implementation::
+
+    /graph/walk/:NODE_ID/ori?direction=backward&edges=*
+
+- **complete origin provenance**: given a content, directory, or commit node,
+  return *all* origins that have at least one snapshot that (recursively)
+  contains it
+
+  Implementation::
+
+    /graph/leaves/:NODE_ID?direction=backward&edges=*
+
+- *SLOC tracking*: left as future work
+
+
+Provenance statistics
+~~~~~~~~~~~~~~~~~~~~~
+
+The following use cases require traversing the *backward (transposed)
+graph*.
+
+- **content popularity across commits**: for each content, count the number of
+  commits (or *commit popularity*) that link to a directory that (recursively)
+  includes it. Plot the distribution of content popularity across commits.
+
+  Implementation: apply *complete commit provenance* to each content node,
+  count the returned commits, aggregate.
+
+- **commit popularity across origins**: for each commit, count the number of
+  origins (or *origin popularity*) that have a snapshot that (recursively)
+  includes it. Plot the distribution of commit popularity across origins.
+
+  Implementation: apply *complete origin provenance* to each commit node, count
+  the returned origins, aggregate.
+
+- *SLOC popularity across contents*: left as future work
+
+The following use cases require traversing the *forward graph*.
+
+- **revision size** (as n. of contents) distribution: for each revision, count
+  the number of contents that are (recursively) reachable from it. Plot the
+  distribution of revision sizes.
+
+- **origin size** (as n. of revisions) distribution: for each origin, count the
+  number of revisions that are (recursively) reachable from it. Plot the
+  distribution of origin sizes.
+
+
+Benchmarks
+==========
+
+Notes on how to benchmark graph access:
+
+- separate pure-graph timings from timings related to use additional mappings
+  (e.g., node types), no matter if the mappings are in-memory or on-disk
+
+- separate in-memory timings from on-disk timings; in particular, separate the
+  timing of translating node identifiers between internal integers and SWH PIDs
+
+- for each use case that requires a node as input, we will randomize the choice
+  of the input node and repeat the experiment a suitable number of times; where
+  possible we will aggregate results computing basic statistics (average,
+  standard deviation), as well as normalize results w.r.t. the “size” of the
+  chosen node (e.g., number of nodes/path length in the resulting visit)
+
+
+Basic benchmarks
+----------------
+
+- **Edge traversal**: given a node, retrieve the first node in its adjacency
+  list.
+
+  For reference: Apostolico, Drovandi in *Graph Compression by BFS* report
+  times to retrieve the adjacency list of a node (and/or test if an edge exists
+  between two nodes) in the 2-3 us range, for the largest graph in their
+  experiments (22 M nodes, 600 M edges).
+
+
+Each use case is a benchmark
+----------------------------
+
+In addition to abstract benchmark, we will use each use case above as a
+scenario-based benchmark.