#+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt)

# R&D challenges

#+INCLUDE: "prelude.org" :minlevel 1
* R&D challenges
  :PROPERTIES:
  :CUSTOM_ID: main
  :END:
** Data model
*** The real world sucks
    - corrupted repositories
    - takedown notices
    - partial irrecoverable data losses
    #+BEAMER: \pause
*** /Incomplete/ Merkle DAGs
    - nodes can go missing at archival time or disappear later on
    - top-level hash(es) no longer capture the full state of the archive
    #+BEAMER: \pause
*** Open questions
    - how do you capture such full state then?
    - how do you efficiently check if something is to be re-archived?
    - ultimately, what's your notion of having "fully archived" something?
** Storage
*** Archive stats
    - as a graph: ~10 B nodes, ~100 B edges
    - nodes breakdown: ~40% contents, ~40% directories, ~10% commits
    - content size: ~400 TB (raw), ~200 TB compressed (content by content)
    - median compressed size: 3 KB
    - i.e., *a lot of very small files*
  #+BEAMER: \pause
*** Current storage solution (unsatisfactory)
    - contents: ad hoc object storage with multiple backends
      - file-system, Azure, AWS, etc.
    - rest of the graph: Postgres (~6 TB)
      - rationale: recursive queries to traverse the graph
      - (no, it doesn't work at this scale)
** Storage (cont.)
*** Requirements
    - long-term storage
    - suitable for distribution/replica
    - suitable for scale-out processing
  #+BEAMER: \pause
*** Graph
    - early experiences with Ceph (RADOS)
      - not a good fit out of the box
      - 7x size increase over target retention policy due to large minimum
        chunk size (64 KB)
    - ad-hoc object packing (?)
    - .oO( do we really have to re-invent a file-system? )
** Storage (cont.)
*** Contents --- size considerations
    - a few hundreds TB is not /that/ big, but it cuts off volunteer mirrors
  #+BEAMER: \pause
*** Content compression
    - low compression ration (2x) with 1-by-1 compression
    - typical Git/VCS packing heuristics do not work here, because contents
      occur in many different contexts
    - early experiences with Rabin-style compression & co. were unsatisfactory
  #+BEAMER: \pause
*** Distributed archival
    - massively distributed archival (e.g., P2P) would be nice
    - but most P2P techs are more like CDNs than archives and do not offer
      retention policy guarantees (e.g., self-healing)
** Efficient graph processing
*** Use cases
    - Vault: recursive visits to collect archived objects
    - Provenance: single-destination shortest path
  #+BEAMER: \pause
*** Technology
   - beyond the capabilities of off-the-shelf graph DBs
   - graph topology: scale-free, but not small world
   - /probably/ bad fit for Pregel/Chaoss/etc
   - are web graph style compression techniques suitable for storing and
     processing the Merkle DAG in memory? (unclear)
** Provenance tracking                                        :noexport:
   - TODO