diff --git a/talks-public/2017-10-26-force11/2017-10-26-force11.org b/talks-public/2017-10-26-force11/2017-10-26-force11.org index b6b5b32..69a4d86 100644 --- a/talks-public/2017-10-26-force11/2017-10-26-force11.org +++ b/talks-public/2017-10-26-force11/2017-10-26-force11.org @@ -1,120 +1,123 @@ #+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt) #+TITLE: Identifying 4 billion source code files #+SUBTITLE: Intrinsic identifiers, the Software Heritage experience # does not allow short title, so we override it for beamer as follows : #+BEAMER_HEADER: \author[Stefano Zacchiroli]{Stefano Zacchiroli} #+AUTHOR: Stefano Zacchiroli (Software Heritage, Inria) #+DATE: October 26, 2017 #+EMAIL: zack@upsilon.cc # # Prelude contains all the information needed to export the main beamer latex source # #+INCLUDE: "../../common/modules/prelude.org" :minlevel 1 #+INCLUDE: "../../common/modules/169.org" #+BEAMER_HEADER: \institute[Software Heritage]{University Paris Diderot \& Inria\\\href{mailto:zack@upsilon.cc}{\tt zack@upsilon.cc}} * Software Heritage #+INCLUDE: "../../common/modules/swh-overview-sourcecode.org::#mission" :minlevel 2 ** Our principles #+latex: \begin{center} #+ATTR_LATEX: :width .7\linewidth file:SWH-as-foundation-slim.png #+latex: \end{center} #+BEAMER: \pause *** Open approach :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.3 :BEAMER_env: block :END: open source, transparency *** Unix philosophy :B_block:BMCOL:noexport:noexport: :PROPERTIES: :BEAMER_opt: :BEAMER_env: block :BEAMER_col: 0.3 :END: - do /one/ thing - do it /well/ *** In for the long haul :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.3 :BEAMER_env: block :END: non profit, replication #+INCLUDE: "../../common/modules/status-extended.org::#archive" :minlevel 2 * The quest for a PID ** Our challenge in the PID arena *** Long term Identifiers must be there for the long term *** No middle man Identifiers must be meaningful even if resolvers go away *** Integrity, not just naming Identifier must ensure that the retrieved object is the intended one *** Uniqueness by design only one name for each object, each object has only one name ** Exploring the PID landscape *** A lot of options out there... URL, URI, PURL, URN, ARK, DOI, ... *** ... some are widely used - articles - data - even software artefacts! #+BEAMER: \pause *** We can get no satisfaction \hfill of all the key criteria #+BEAMER: \pause *** \hfill we adopted something radically different \hfill ** Intrinsic identifiers in Software Heritage # R. C. Merkle, A digital signature based on a conventional encryption # function, Crypto '87 #+BEAMER: \vspace{-3mm} ***** Merkle tree (R. C. Merkle, Crypto 1979) :B_picblock: :PROPERTIES: :BEAMER_opt: pic=merkle, leftpic=true, width=.5\linewidth :BEAMER_env: picblock :BEAMER_act: :END: Combination of - tree - hash function ***** Classical cryptographic construction fast, parallel signature of large data structures, built-in deduplication #+BEAMER: \pause - satisfies all three criteria - widely used in industry (e.g., Git, nix, blockchains, IPFS, ...) ** Back to basics: DIOs vs. IDOs *** DIO (digital identifier of an object) - digital identifiers for traditional (non digital) objects - epistemic complications (manifestations, versions, locations, etc.) - significant governance issues, ... #+BEAMER: \pause *** IDO (identifier of a digital object) - (digital) identifier for digital objects - much simpler to build/handle - can (and must) be intrinsic #+BEAMER: \pause *** Separation of concerns - yes, we \alert{need both} DIOs and IDOs - no, we \alert{must not mistake} DIOs for IDOs (and viceversa) ** Working together *** Example: links to /software source code/ in an article Leveraging the Software Heritage universal archive: - - set of files :: \small\url{swh:1:rev:06741c8c37c5a384083082b99f4c5ad94cd0cd1f}\\ - id of tree object listing all the files in a project (at a given time) - - revision :: \url{swh:1:rev:7598fb94d59178d65bd8d2892c19356290f5d4e3}\\ - id of commit object which a tree and (a pointer to) the history - #+BEAMER: \vspace{-2mm}\pause - - metadata :: this /will/ involve some form of DIO - - and we get all the complications back - #+BEAMER: \vspace{-2mm}\pause + - set of files :: \small\url{swh:1:dir:06741c8c37c5a384083082b99f4c5ad94cd0cd1f}\\ + \normalsize + id of tree object listing all the files in a project (at a given time) + #+BEAMER: \pause + - revision :: \small\url{swh:1:rev:7598fb94d59178d65bd8d2892c19356290f5d4e3}\\ + \normalsize + id of commit object which a tree and (a pointer to) the history + #+BEAMER: \pause + - metadata :: this /will/ involve some form of DIO (and we get all the + complications back) + #+BEAMER: \pause *** Come in, we're open http://www.softwareheritage.org #+INCLUDE: "../../common/modules/biblio.org::#ipres2017" :only-contents t * A look at the internals :B_appendix: :PROPERTIES: :BEAMER_env: appendix :END: #+INCLUDE: "../../common/modules/status-extended.org::#merkledemo" :minlevel 2