diff --git a/talks-public/2017-10-26-force11/2017-10-26-force11.org b/talks-public/2017-10-26-force11/2017-10-26-force11.org new file mode 100644 index 0000000..b6b5b32 --- /dev/null +++ b/talks-public/2017-10-26-force11/2017-10-26-force11.org @@ -0,0 +1,120 @@ +#+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt) +#+TITLE: Identifying 4 billion source code files +#+SUBTITLE: Intrinsic identifiers, the Software Heritage experience +# does not allow short title, so we override it for beamer as follows : +#+BEAMER_HEADER: \author[Stefano Zacchiroli]{Stefano Zacchiroli} +#+AUTHOR: Stefano Zacchiroli (Software Heritage, Inria) +#+DATE: October 26, 2017 +#+EMAIL: zack@upsilon.cc + +# +# Prelude contains all the information needed to export the main beamer latex source +# + +#+INCLUDE: "../../common/modules/prelude.org" :minlevel 1 +#+INCLUDE: "../../common/modules/169.org" +#+BEAMER_HEADER: \institute[Software Heritage]{University Paris Diderot \& Inria\\\href{mailto:zack@upsilon.cc}{\tt zack@upsilon.cc}} + +* Software Heritage + #+INCLUDE: "../../common/modules/swh-overview-sourcecode.org::#mission" :minlevel 2 +** Our principles +#+latex: \begin{center} +#+ATTR_LATEX: :width .7\linewidth +file:SWH-as-foundation-slim.png +#+latex: \end{center} +#+BEAMER: \pause +*** Open approach :B_block:BMCOL: + :PROPERTIES: + :BEAMER_col: 0.3 + :BEAMER_env: block + :END: + open source, transparency +*** Unix philosophy :B_block:BMCOL:noexport:noexport: + :PROPERTIES: + :BEAMER_opt: + :BEAMER_env: block + :BEAMER_col: 0.3 + :END: + - do /one/ thing + - do it /well/ +*** In for the long haul :B_block:BMCOL: + :PROPERTIES: + :BEAMER_col: 0.3 + :BEAMER_env: block + :END: + non profit, replication + #+INCLUDE: "../../common/modules/status-extended.org::#archive" :minlevel 2 +* The quest for a PID +** Our challenge in the PID arena +*** Long term + Identifiers must be there for the long term +*** No middle man + Identifiers must be meaningful even if resolvers go away +*** Integrity, not just naming + Identifier must ensure that the retrieved object is the intended one +*** Uniqueness by design + only one name for each object, each object has only one name +** Exploring the PID landscape +*** A lot of options out there... + URL, URI, PURL, URN, ARK, DOI, ... +*** ... some are widely used + - articles + - data + - even software artefacts! +#+BEAMER: \pause +*** We can get no satisfaction + \hfill of all the key criteria +#+BEAMER: \pause +*** + \hfill we adopted something radically different \hfill +** Intrinsic identifiers in Software Heritage + # R. C. Merkle, A digital signature based on a conventional encryption + # function, Crypto '87 + #+BEAMER: \vspace{-3mm} +***** Merkle tree (R. C. Merkle, Crypto 1979) :B_picblock: + :PROPERTIES: + :BEAMER_opt: pic=merkle, leftpic=true, width=.5\linewidth + :BEAMER_env: picblock + :BEAMER_act: + :END: + Combination of + - tree + - hash function +***** Classical cryptographic construction + fast, parallel signature of large data structures, built-in deduplication +#+BEAMER: \pause + - satisfies all three criteria + - widely used in industry (e.g., Git, nix, blockchains, IPFS, ...) +** Back to basics: DIOs vs. IDOs +*** DIO (digital identifier of an object) + - digital identifiers for traditional (non digital) objects + - epistemic complications (manifestations, versions, locations, etc.) + - significant governance issues, ... +#+BEAMER: \pause +*** IDO (identifier of a digital object) + - (digital) identifier for digital objects + - much simpler to build/handle + - can (and must) be intrinsic +#+BEAMER: \pause +*** Separation of concerns + - yes, we \alert{need both} DIOs and IDOs + - no, we \alert{must not mistake} DIOs for IDOs (and viceversa) +** Working together +*** Example: links to /software source code/ in an article + Leveraging the Software Heritage universal archive: + - set of files :: \small\url{swh:1:rev:06741c8c37c5a384083082b99f4c5ad94cd0cd1f}\\ + id of tree object listing all the files in a project (at a given time) + - revision :: \url{swh:1:rev:7598fb94d59178d65bd8d2892c19356290f5d4e3}\\ + id of commit object which a tree and (a pointer to) the history + #+BEAMER: \vspace{-2mm}\pause + - metadata :: this /will/ involve some form of DIO + - and we get all the complications back + #+BEAMER: \vspace{-2mm}\pause +*** Come in, we're open + http://www.softwareheritage.org + #+INCLUDE: "../../common/modules/biblio.org::#ipres2017" :only-contents t +* A look at the internals :B_appendix: + :PROPERTIES: + :BEAMER_env: appendix + :END: + #+INCLUDE: "../../common/modules/status-extended.org::#merkledemo" :minlevel 2 diff --git a/talks-public/2017-10-26-force11/Makefile b/talks-public/2017-10-26-force11/Makefile new file mode 100644 index 0000000..68fbee7 --- /dev/null +++ b/talks-public/2017-10-26-force11/Makefile @@ -0,0 +1 @@ +include ../Makefile.slides