diff --git a/talks-public/2017-10-26-force11/2017-10-26-force11.org b/talks-public/2017-10-26-force11/2017-10-26-force11.org index 69a4d86..18af352 100644 --- a/talks-public/2017-10-26-force11/2017-10-26-force11.org +++ b/talks-public/2017-10-26-force11/2017-10-26-force11.org @@ -1,123 +1,88 @@ #+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt) -#+TITLE: Identifying 4 billion source code files -#+SUBTITLE: Intrinsic identifiers, the Software Heritage experience +#+TITLE: Citing 65 Million Software Projects +#+SUBTITLE: Leveraging the Software Heritage archive for software citation # does not allow short title, so we override it for beamer as follows : #+BEAMER_HEADER: \author[Stefano Zacchiroli]{Stefano Zacchiroli} #+AUTHOR: Stefano Zacchiroli (Software Heritage, Inria) #+DATE: October 26, 2017 #+EMAIL: zack@upsilon.cc # # Prelude contains all the information needed to export the main beamer latex source # #+INCLUDE: "../../common/modules/prelude.org" :minlevel 1 #+INCLUDE: "../../common/modules/169.org" #+BEAMER_HEADER: \institute[Software Heritage]{University Paris Diderot \& Inria\\\href{mailto:zack@upsilon.cc}{\tt zack@upsilon.cc}} * Software Heritage - #+INCLUDE: "../../common/modules/swh-overview-sourcecode.org::#mission" :minlevel 2 -** Our principles -#+latex: \begin{center} -#+ATTR_LATEX: :width .7\linewidth -file:SWH-as-foundation-slim.png -#+latex: \end{center} -#+BEAMER: \pause -*** Open approach :B_block:BMCOL: - :PROPERTIES: - :BEAMER_col: 0.3 - :BEAMER_env: block - :END: - open source, transparency -*** Unix philosophy :B_block:BMCOL:noexport:noexport: - :PROPERTIES: - :BEAMER_opt: - :BEAMER_env: block - :BEAMER_col: 0.3 - :END: - - do /one/ thing - - do it /well/ -*** In for the long haul :B_block:BMCOL: - :PROPERTIES: - :BEAMER_col: 0.3 - :BEAMER_env: block - :END: - non profit, replication + #+INCLUDE: "../../common/modules/swh-overview-sourcecode.org::#mission" :minlevel 2 + #+INCLUDE: "../../common/modules/principles-short.org::#principles" :minlevel 2 + #+INCLUDE: "../../common/modules/status-extended.org::#dataflow" :minlevel 2 #+INCLUDE: "../../common/modules/status-extended.org::#archive" :minlevel 2 -* The quest for a PID -** Our challenge in the PID arena +* Citing archived software +** Our requirements in the PID arena *** Long term Identifiers must be there for the long term *** No middle man Identifiers must be meaningful even if resolvers go away *** Integrity, not just naming Identifier must ensure that the retrieved object is the intended one *** Uniqueness by design only one name for each object, each object has only one name -** Exploring the PID landscape +** Exploring the PID landscape :noexport: *** A lot of options out there... URL, URI, PURL, URN, ARK, DOI, ... *** ... some are widely used - articles - data - even software artefacts! #+BEAMER: \pause *** We can get no satisfaction \hfill of all the key criteria #+BEAMER: \pause *** \hfill we adopted something radically different \hfill ** Intrinsic identifiers in Software Heritage - # R. C. Merkle, A digital signature based on a conventional encryption - # function, Crypto '87 - #+BEAMER: \vspace{-3mm} -***** Merkle tree (R. C. Merkle, Crypto 1979) :B_picblock: - :PROPERTIES: - :BEAMER_opt: pic=merkle, leftpic=true, width=.5\linewidth - :BEAMER_env: picblock - :BEAMER_act: - :END: - Combination of - - tree - - hash function -***** Classical cryptographic construction - fast, parallel signature of large data structures, built-in deduplication -#+BEAMER: \pause - - satisfies all three criteria - - widely used in industry (e.g., Git, nix, blockchains, IPFS, ...) + #+INCLUDE: "../../common/modules/status-extended.org::#merkletree" :only-contents t ** Back to basics: DIOs vs. IDOs *** DIO (digital identifier of an object) - digital identifiers for traditional (non digital) objects - epistemic complications (manifestations, versions, locations, etc.) - significant governance issues, ... #+BEAMER: \pause *** IDO (identifier of a digital object) - (digital) identifier for digital objects - much simpler to build/handle - can (and must) be intrinsic #+BEAMER: \pause *** Separation of concerns - - yes, we \alert{need both} DIOs and IDOs - - no, we \alert{must not mistake} DIOs for IDOs (and viceversa) -** Working together -*** Example: links to /software source code/ in an article - Leveraging the Software Heritage universal archive: - - set of files :: \small\url{swh:1:dir:06741c8c37c5a384083082b99f4c5ad94cd0cd1f}\\ - \normalsize - id of tree object listing all the files in a project (at a given time) + - we need /both/ DIOs and IDOs + - we should not mistake DIOs for IDOs (and viceversa) +** Referencing archived software +*** Use case #1 --- reference archived source code /artifacts/ + #+BEAMER: \pause + - directory :: \small\url{swh:1:dir:06741c8c37c5a384083082b99f4c5ad94cd0cd1f}\\ + \normalsize + id of tree object listing all the files in a project (at a given time) #+BEAMER: \pause - revision :: \small\url{swh:1:rev:7598fb94d59178d65bd8d2892c19356290f5d4e3}\\ \normalsize id of commit object which a tree and (a pointer to) the history #+BEAMER: \pause +*** Use case #2 --- reference an abstract software /product/ / /"project"/ - metadata :: this /will/ involve some form of DIO (and we get all the complications back) - #+BEAMER: \pause +** Working together +*** A first prototype? + Daniel S. Katz, /Software Heritage and repository metadata: a software + citation solution/, + #+latex: {\footnotesize \url{https://danielskatzblog.wordpress.com/2017/09/25/software-heritage-and-repository-metadata-a-software-citation-solution/}} *** Come in, we're open - http://www.softwareheritage.org + #+beamer: {\hfill \Large \url{www.softwareheritage.org} \hfill} #+INCLUDE: "../../common/modules/biblio.org::#ipres2017" :only-contents t * A look at the internals :B_appendix: :PROPERTIES: :BEAMER_env: appendix :END: #+INCLUDE: "../../common/modules/status-extended.org::#merkledemo" :minlevel 2