#+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt) #+INCLUDE: "prelude.org" :minlevel 1 * Making it happen :PROPERTIES: :CUSTOM_ID: main :END: ** Status: scope :noexport: Software heritage is an /infrastructure/ that needs to ensure *** Availability of the source code, in the long term *** Traceability of all the contents *** Uniformity of access to all the sources, irrespective of their origin ** Status: technical :PROPERTIES: :CUSTOM_ID: numbers :END: *** Our sources - Debian -- all of it, daily snapshots since 2005 - Github -- public repositories, as of August 2015 - Gnu.org -- all of it, as of August 2015 *** Some numbers - 14 million repositories - 400 million commits (real or synthetic) - 1.9 billions unique source files # # # ** What's special :PROPERTIES: :CUSTOM_ID: datamodel :END: *** /Uniform/ data model - superset of /git/ : ambition to /cover all VCS/ - contents, directories, revisions, releases, origins, ... *** Massive /deduplication/ :ATTACH: :PROPERTIES: :Attachments: gpl2-names :ID: e0f8d015-9679-46de-a24d-92514a340975 :END: - the biggest git-like graph in the world right now + did you know? the original GPLv2 licence * appears with more than 500 different file names * including /aa.css.txt/ and /FullSync.txt/ :-) *** /Provenance/ tracking - know /where/ we found /what/, /when/ - essential for /traceability/ ** Status: institutional #+LATEX: \includegraphics[width=\extblockscale{.3\linewidth}]{inria-logo-new.png} \vspace{3em} *** Inria as initiator French national institute for research in Computer Science. - Contributed to the birth of W3C - 4500 people, many prestigious scientists - In the news: Freak and Logjam TLS vulnerabilities Inria is /fully supporting/ the bootstrap phase of Software Heritage. ** Bits from the roadmap *** Our ambition: going global \hfill an /open, sustainable, multistakeholder, nonprofit/ organisation # - organising the technological development # - coordinating a network of peers # - structuring the collaboration *** The road ahead: policy - public announcement, with the first cofounders and partners - first working groups, open process for collaboration - connections with research, industry and educational users *** The road ahead: technical - distributed infrastructure - advanced search, track, explore APIs that scale to tens of billion files / Tera SLOCs