diff --git a/talks-public/2018-01-25-rocq-sesi/2018-01-25-rocq-sesi.org b/talks-public/2018-01-25-rocq-sesi/2018-01-25-rocq-sesi.org new file mode 100644 index 0000000..947dbb9 --- /dev/null +++ b/talks-public/2018-01-25-rocq-sesi/2018-01-25-rocq-sesi.org @@ -0,0 +1,140 @@ +#+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt) +#+TITLE: Software Heritage +#+SUBTITLE: Technical challenges when archiving the entire Software Commons +#+BEAMER_HEADER: \date[Inria Rocquencourt]{25 January 2018\\Inria Rocquencourt} +#+AUTHOR: Stefano Zacchiroli +#+DATE: 25 January 2018 + +#+INCLUDE: "../../common/modules/prelude-toc.org" :minlevel 1 +#+INCLUDE: "../../common/modules/169.org" +#+BEAMER_HEADER: \institute{Inria, Software Heritage} + +#+LATEX_HEADER: \usepackage{bbding} +#+LATEX_HEADER: \DeclareUnicodeCharacter{66D}{\FiveStar} + +* The Software Commons + #+INCLUDE: "../../common/modules/source-code-different-short.org::#softwareisdifferent" :minlevel 2 +** Our Software Commons + #+INCLUDE: "../../common/modules/foss-commons.org::#commonsdef" :only-contents t + #+BEAMER: \pause +*** Source code is /a precious part/ of our commons + \hfill are we taking care of it? + # #+INCLUDE: "../../common/modules/swh-motivations-foss.org::#main" :only-contents t :minlevel 2 + #+INCLUDE: "../../common/modules/swh-motivations-foss.org::#fragile" :minlevel 2 + #+INCLUDE: "../../common/modules/swh-motivations-foss.org::#research" :minlevel 2 +* Software Heritage + #+INCLUDE: "../../common/modules/swh-overview-sourcecode.org::#mission" :minlevel 2 + #+INCLUDE: "../../common/modules/principles-compact.org::#principles" :minlevel 2 +* Architecture + #+INCLUDE: "../../common/modules/status-extended.org::#archivinggoals" :minlevel 2 + #+INCLUDE: "../../common/modules/status-extended.org::#architecture" :only-contents t + #+INCLUDE: "../../common/modules/status-extended.org::#merkletree" :minlevel 2 + #+INCLUDE: "../../common/modules/status-extended.org::#merklerevision" :only-contents t + #+INCLUDE: "../../common/modules/status-extended.org::#giantdag" :only-contents t + #+INCLUDE: "../../common/modules/status-extended.org::#archive" :minlevel 2 + #+INCLUDE: "../../common/modules/status-extended.org::#apiintro" :minlevel 2 + #+INCLUDE: "../../common/modules/status-extended.org::#features" :minlevel 2 +* Technical challenges +** Technology: how do you store the SWH DAG? +*** Problem statement + - How would you store and query a graph with 10 billion nodes and 60 + billion edges? + - How would you store the contents of more than 3 billion files, 300TB of + raw data? + - ... on a limited budget (100 000 € of hardware overall) + #+BEAMER: \pause +*** Our hardware stack + - two hypervisors with 512GB RAM, 20TB SSD each, sharing access to a + storage array (60 x 6TB spinning rust) + - one backup server with 48GB RAM and another storage array +*** Our software stack + - A RDBMS (PostgreSQL, what else?), for storage of the graph nodes and + edges + - filesystems for storing the actual file contents +** Technology: archive storage components +*** Metadata storage + - Python module *swh.storage* + - thin Python API over a pile of PostgreSQL functions + - motivation: keeping relational integrity at the lowest layer +*** Content ("object") storage + - Python module *swh.objstorage* + - very thin object storage abstraction layer (PUT, APPEND and GET) over + regular storage technologies + - separate layer for asynchronous replication and integrity management + (*swh.archiver*) + - motivation: stay as technology neutral as possible for future mirrors +** Technology: object storage +*** Primary deployment + - Storage on 16 sharded XFS filesystems; key = /sha1/ (content), value = + /gzip/ (content) + - if sha1 = *abcdef01234...*, file path = / srv / storage / *a* / *ab* / + *cd* / *ef* / *abcdef01234...* + - 3 directory levels deep, each level 256-wide = 16 777 216 directories (1 + 048 576 per partition) +*** Secondary deployment + - Storage on Azure blob storage + - 16 storage containers, objects stored in a flat structure there +** Technology: object storage review +*** Generic model is fine + The abstraction layer is fairly simple and generic, and the implementation + of the upper layers (replication, integrity checking) was a breeze. +*** Filesystem implementation is bad + Slow spinning storage + little RAM (48GB) + 16 million dentries = (very) + bad performance +** Technology: metadata storage +*** Current deployment + - PostgreSQL deployed in primary/replica mode, using pg\under{}logical for + replication: different indexes on primary (tuned for writes) and replicas + (tuned for reads). + - most logic done in SQL + - thin Pythonic API over the SQL functions +*** End goals + - proper handling of relations between objects at the lowest level + - doing fast recursive queries on the graph (e.g., find the provenance info + for a content, walking up the whole graph, with a single query) +** Technology: metadata storage review +*** Limited resources + PostgreSQL works really well + #+BEAMER: \pause + ... until your indexes don't fit in RAM + #+BEAMER: \pause +*** + Our recursive queries jump between different object types, and between + evenly distributed hashes. Data locality doesn't exist. Caches break down. + #+BEAMER: \pause +*** + Massive deduplication = efficient storage + #+BEAMER: \pause + *but* Massive deduplication = exponential width for recursive queries + #+BEAMER: \pause +*** Reality check + Referential integrity? + #+BEAMER: \pause + Real repositories downloaded from the internet are all kinds of broken. +** Technology: outlook +*** Object storage + Our Azure prototype shows that using a scale-out "cloudy" technology for + our object storage works really well. Plain filesystems on spinning rust, + not so much. + #+BEAMER: \pause + We are now experimenting with scale-out object storages (and in particular + Ceph) for the main copy of the archive. + #+BEAMER: \pause +*** Metadata storage + Our initial assumption that we wanted referential integrity and built-in + recursive queries was wrong. + #+BEAMER: \pause + We could probably migrate to "dumb" object storages for each type of + object, with another layer to check metadata integrity regularly. +* Community +** You can help! +*** Coding + - \url{www.softwareheritage.org/community/developers/} + - \url{forge.softwareheritage.org} --- *our own code* +*** Current development priorities + | ٭٭٭ | listers for unsupported forges, distros, pkg. managers | + | ٭٭٭ | loaders for unsupported VCS, source package formats | + | ٭٭ | Web UI: eye candy wrapper around the Web API | + | ٭ | content indexing and search | + … /all/ contributions equally welcome! + #+INCLUDE: "../../common/modules/swh-backmatter.org::#conclusion" :minlevel 2 diff --git a/talks-public/2018-01-25-rocq-sesi/Makefile b/talks-public/2018-01-25-rocq-sesi/Makefile new file mode 100644 index 0000000..68fbee7 --- /dev/null +++ b/talks-public/2018-01-25-rocq-sesi/Makefile @@ -0,0 +1 @@ +include ../Makefile.slides