#+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt) # R&D challenges #+INCLUDE: "prelude.org" :minlevel 1 * R&D challenges :PROPERTIES: :CUSTOM_ID: main :END: ** Data model *** The real world sucks - corrupted repositories - takedown notices - partial irrecoverable data losses #+BEAMER: \pause *** /Incomplete/ Merkle DAGs - nodes can go missing at archival time or disappear later on - top-level hash(es) no longer capture the full state of the archive #+BEAMER: \pause *** Open questions - how do you capture such full state then? - how do you efficiently check if something is to be re-archived? - ultimately, what's your notion of having "fully archived" something? ** Storage *** Archive stats - as a graph: ~10 B nodes, ~100 B edges - nodes breakdown: ~40% contents, ~40% directories, ~10% commits - content size: ~400 TB (raw), ~200 TB compressed (content by content) - median compressed size: 3 KB - i.e., *a lot of very small files* #+BEAMER: \pause *** Current storage solution (unsatisfactory) - contents: ad hoc object storage with multiple backends - file-system, Azure, AWS, etc. - rest of the graph: Postgres (~6 TB) - rationale: recursive queries to traverse the graph - (no, it doesn't work at this scale) ** Storage (cont.) *** Requirements - long-term storage - suitable for distribution/replica - suitable for scale-out processing #+BEAMER: \pause *** Graph - early experiences with Ceph (RADOS) - not a good fit out of the box - 7x size increase over target retention policy due to large minimum chunk size (64 KB) - ad-hoc object packing (?) - .oO( do we really have to re-invent a file-system? ) ** Storage (cont.) *** Contents --- size considerations - a few hundreds TB is not /that/ big, but it cuts off volunteer mirrors #+BEAMER: \pause *** Content compression - low compression ration (2x) with 1-by-1 compression - typical Git/VCS packing heuristics do not work here, because contents occur in many different contexts - early experiences with Rabin-style compression & co. were unsatisfactory #+BEAMER: \pause *** Distributed archival - massively distributed archival (e.g., P2P) would be nice - but most P2P techs are more like CDNs than archives and do not offer retention policy guarantees (e.g., self-healing) ** Efficient graph processing *** Use cases - Vault: recursive visits to collect archived objects - Provenance: single-destination shortest path #+BEAMER: \pause *** Technology - beyond the capabilities of off-the-shelf graph DBs - graph topology: scale-free, but not small world - /probably/ bad fit for Pregel/Chaoss/etc - are web graph style compression techniques suitable for storing and processing the Merkle DAG in memory? (unclear) ** Provenance tracking :noexport: - TODO