Design considerations
=====================

* **Caching**: our storage contains two main parts: a file storage, and a git
  object storage. Both parts are accessible as key-value storage. Whenever
  possible we want to avoid checking in content that is provably already in
  there.

* **Concurrency**: our storage will be accessed concurrently for both read and
  write purposes. In particular for writing, it is possible that multiple
  workers will be in the process, at the same time, of loading into our storage
  different git repositories that have a lot of overlap, if not completely
  identical. Whenever possible they should be able to benefit from each other
  work, collaborating *de facto*.

* **Robustness**: workers that load content into our storage might crash before
  completion. Wherever possible, the work done before completion should be
  preserved by the storage. Eventually another worker (possibly the same as
  before) will pickup the same git repository, try it again, and drive it to
  completion.

* **Ctime and Atimes**: for every piece of content we are interested in both
  creation time ("ctime", i.e., the first time we have seen a given content and
  added it to our storage) and access times ("atime", i.e., every time we see
  the same content *elsewhere* we want to be able to store the fact we have
  seen it *again*, and again, and again...).

* **Transactionality**: every content addition should be transactional: only
  after having stored the content in full we will tell the world (and other
  workers) that the content is available. (Without locking, which is desirable)
  This might result in temporary races where multiple workers trying to add the
  same content without knowing of each other---this situation should be handled
  gracefully.

  Transactionality should apply across different storage media: in particular
  the filesystem used to store file content and the DB used to store the
  corresponding metadata should cooperate. It is OK for the filesystem to have
  content that is not indexed in the DB; but for all purposes that should be
  equivalent to not having stored the content *at all*.


Git traversal
=============

To load the whole content of a git repo in our storage we need to traverse the
git object graph, and inject every single version of every single file we find.
We discuss below how we should traverse the git graph to that end.

For the sake of conciseness we do not distinguish git object types. The actual
code does need to treat differently different kind of git objects though (and
in particular commits -> trees -> and blobs); see the implementation for
details about this.


Top-down
--------

* Top-down, topological traversal of the git object graph starting from the
  current refs is optimal from the point of view of caching. Once a given
  object is found in the cache we know that we have already loaded it in the
  storage and we do not need to treat its descendants any further.

* Top-down however is not good for robustness. If we store the current node
  before its descendants and the loading fails to complete, in the future we
  will believe to have stored all its descendants whereas we have note. FAIL.

Conclusion: pure top-down traversal is bad for us.


Bottom-up
---------

* Bottom-up, topological traversal is good for robustness. Once we reach the
  top we know we have stored all its descendants, so in the future we can
  benefit from caching.

* However, bottom-up is bad for caching. If we always treat descendants before
  ancestors we will benefit from caching only at the level of individual
  objects, and never at the level of whole subgraphs.

Conclusion: pure bottom-up traversal is OK, but does not allow to benefit from
subgraph caching.


Mixed top-down/bottom-up
------------------------

To get the best of both worlds we need a mixed approach, something like
(pseudocode):

    let rec load_git_graph node =
        if not (node in storage) then
	        for node' in children(node)
		        load_git_graph(node')
            add_to_storage(node, storage)

Note: non tail-recursive.

Conclusion: the above offers both robustness w.r.t. loading crashes and
subgraph caching.


Atime maintenance
-----------------

Bad news: with the mixed approach it's easy to maintain ctimes, but atimes
cannot be maintained (because we do not visit at all subgraphs). More
generally: subgraph caching or atime maintenance <- choose one.

If we do want to maintain atimes (at this level---as opposed to, say, do that
separately from git repo loading) we need to give up on subgraph caching. If we
do that, top-down vs bottom-up doesn't really matter.


Cross file system + DB transactions
===================================

To ensure file system DB transaction, to add a single file to our storage we
proceed as follows, where KEY is the key of the file to be added, and WORKER_ID
the unique identifier of the worker that is updating the storage:

1. BEGIN TRANSACTION
2. create file KEY.WORKER_ID, overwriting destination if needed
3. write file content to KEY.WORKER_ID
4. rename(KEY.WORKER_ID, KEY), overwriting destination if needed
5. INSERT KEY INTO ...
6. COMMIT

any error in the above would cause a transaction ABORT.

Failure scenarios (that should all be handled properly by the above protocol):

* worker crash, at any moment during the above
* parallel execution, resulting in one worker failing due to key duplication
  upon step (5) or (6)