diff --git a/talks-public/2019-02-05-MaxPlankDL/2019-02-05-MaxPlankDL.org b/talks-public/2019-02-05-MaxPlankDL/2019-02-05-MaxPlankDL.org new file mode 100644 index 0000000..076081a --- /dev/null +++ b/talks-public/2019-02-05-MaxPlankDL/2019-02-05-MaxPlankDL.org @@ -0,0 +1,533 @@ +#+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt) +#+TITLE: Software Heritage +#+SUBTITLE: A revolutionary infrastructure for Open Science +#+BEAMER_HEADER: \title[Software Heritage for Open Science]{Software Heritage} +#+AUTHOR: Roberto Di Cosmo +#+EMAIL: roberto@dicosmo.org +#+DATE: February 5th, 2019 +#+BEAMER_HEADER: \date[February 5th, 2019]{February 5th, 2019\\ Open Science Days @ Max Plank Digital Library} +#+KEYWORDS: software heritage legacy preservation knowledge mankind technology +# +# prelude.org contains all the information needed to export the main beamer latex source +# use prelude-toc.org to get the table of contents +# +#+LATEX_HEADER: \usepackage{tcolorbox} + +#+INCLUDE: "../../common/modules/prelude-toc.org" :minlevel 1 + +#+INCLUDE: "../../common/modules/169.org" + +* Introductions +# #+INCLUDE: "../../common/modules/rdc-bio.org::#main" :only-contents t :minlevel 2 +** Short Bio: Roberto Di Cosmo + # +BEAMER: \raisebox{-.5\height}{\includegraphics[width=.28\linewidth]{rdc}} + Computer Science professor in Paris, now working at INRIA\\ + - /30 years/ of research (Theor. CS, Programming, Software Engineering, Erdos #: 3)\\ + - /20 years/ of Free and Open Source Software\\ + - /10 years/ building and directing structures for the common good\\ + + \mbox{}\\ + \begin{minipage}[c]{0.18\linewidth} + \includegraphics[width=1.0\linewidth]{rdc} + \end{minipage} + \begin{minipage}[c]{0.8\linewidth} + \begin{description} +% \item[1998] \emph{Cybersnare} -- voice of French FOSS + \item[1999] \emph{DemoLinux} -- first live GNU/Linux distro +% \item[2004] \emph{EDOS} -- check package dependencies + \item[2007] \emph{Free Software Thematic Group}\\ + %\tiny{\url{http://www.systematic-paris-region.org/fr/logiciel-libre}}\\ + ~150 members ~40 projects ~200Me +% \item[2008] \emph{Mancoosi project} \url{www.mancoosi.org} +% \item[2010] \emph{IRILL} \url{www.irill.org} + \item[2015] \emph{Software Heritage} at INRIA + \item[2018] \emph{National Committee for Open Science}, France + \end{description} + \end{minipage} + +* Software source code: a pillar of Open Science +** The knowledge is in the source code! +#+INCLUDE: "../../common/modules/source-code-different-short.org::#thesourcecode" :only-contents t :minlevel 3 +** Source code is /special/ +#+INCLUDE: "../../common/modules/source-code-different-short.org::#softwareisdifferent" :only-contents t :minlevel 3 +** ~ 50 years, a lightning fast growth +#+INCLUDE: "../../common/modules/50years-source-code.org::#apollolinux" :only-contents t :minlevel 3 +** The scientific method... + #+INCLUDE: "../../common/modules/scientific-method.org::#short" :only-contents t :minlevel 3 +** ... evolves in the digital age! +#+INCLUDE: "../../common/modules/reprod-digital-age.org::#reprod" :only-contents t :minlevel 3 +** Software Source code is an important pillar +*** The Magic Triangle of Scientific Knowledge +#+latex: \begin{center} +#+ATTR_LATEX: :width \extblockscale{.7\linewidth} +file:PreservationTriangle.png +#+latex: \end{center} +#+BEAMER: \pause +*** Nota bene + \hfill The links in the picture are *essential* +* An inconvenient truth +** A /forgotten/ pillar of Open Science +*** No reference catalog + :PROPERTIES: + :BEAMER_env: block + :BEAMER_COL: .3 + :END: +#+BEGIN_EXPORT latex +\begin{center} +\includegraphics[width=.6\linewidth]{myriadsources} +\end{center} +#+END_EXPORT + to find and reference *all* the source code +#+BEAMER: \pause +*** No universal archive + :PROPERTIES: + :BEAMER_env: block + :BEAMER_COL: .3 + :END: +#+BEGIN_EXPORT latex +\begin{center} +\includegraphics[width=.6\linewidth]{fragilecloud} +\end{center} +#+END_EXPORT + to preserve *all* the source code +#+BEAMER: \pause +*** No research infrastructure :B_block: + :PROPERTIES: + :BEAMER_COL: .3 + :BEAMER_env: block + :END: +#+BEGIN_EXPORT latex +\begin{center} +\includegraphics[width=.7\linewidth]{atacama-telescope} +\end{center} +#+END_EXPORT + to enable analysis of *all* the source code +*** :B_ignoreheading: + :PROPERTIES: + :BEAMER_env: ignoreheading + :END: +#+BEAMER: \pause +*** Lack of recognition + :PROPERTIES: + :BEAMER_env: block + :BEAMER_COL: .5 + :END: + not (yet) a first class citizen + - in the EOSC plan + - in the EU copyright reform + - in the scholarly works +#+BEAMER: \pause +*** Lack of established guidance + :PROPERTIES: + :BEAMER_env: block + :BEAMER_COL: .5 + :END: + - choose a license + - cite a software project + - make source code available + - relate to industry best practices +# #+BEAMER: \pause +# *** :B_ignoreheading: +# :PROPERTIES: +# :BEAMER_env: ignoreheading +# :END: +# *** Lack of basic prerequisites to reproducibility +# See a discussion in \url{annex.softwareheritage.org/talks/2018/2018-09-17-STScI_public.pdf} +** No catalog, no archive, no references: we are at a turning point +#+INCLUDE: "../../common/modules/turningpoint.org::#turningpoint" :only-contents t :minlevel 5 +* Software Heritage: a revolutionary infrastructure +** Software Heritage, in a nutshell +#+BEAMER: \transdissolve +# #+INCLUDE: "../../common/modules/swh-goals-oneslide-vertical.org::#goals" :only-contents t :minlevel 3 +#+latex: \begin{center} +#+ATTR_LATEX: :width \extblockscale{.8\linewidth} +file:SWH-logo+motto.pdf +#+latex: \end{center} +*** /Collect, preserve and share/ the /source code/ of /all the software/ + \hfill Preserving our heritage, enabling better software and better science for all +#+BEAMER: \pause +# +*** :B_ignoreheading: + :PROPERTIES: + :BEAMER_env: ignoreheading + :END: + #+latex: \begin{center} + #+ATTR_LATEX: :width 0.8\linewidth + file:SWH-as-foundation-slim.png + #+latex: \end{center} + #+BEAMER: \pause +*** Technology + :PROPERTIES: + :BEAMER_col: 0.34 + :BEAMER_env: block + :END: + - transparency and *FOSS* + - *replicas* all the way down +*** Content + :PROPERTIES: + :BEAMER_col: 0.32 + :BEAMER_env: block + :END: + - *intrinsic identifiers* + - facts and *provenance* +*** Organization + :PROPERTIES: + :BEAMER_col: 0.33 + :BEAMER_env: block + :END: + - *non-profit* + - multi-stakeholder +# * Status +** Coverage +#+INCLUDE: "../../common/modules/status-extended.org::#archive" :only-contents t :minlevel 3 +* Under the hood: architecture and data structure +** Automation, and storage + #+BEAMER: \begin{center} + #+BEAMER: \mode{\only<1>{\includegraphics[width=\extblockscale{1.1\textwidth}]{swh-dataflow-merkle-listers.pdf}}} + #+BEAMER: \only<2-3>{\includegraphics[width=\extblockscale{1.1\textwidth}]{swh-dataflow-merkle.pdf}} + #+BEAMER: \end{center} + #+BEAMER: \pause + #+BEAMER: \pause +Full development history *permanently archived* in a *uniform data model*. +** Much more than an archive! + # R. C. Merkle, A digital signature based on a conventional encryption + # function, Crypto '87 + #+BEAMER: \vspace{-3mm} +***** Merkle tree (R. C. Merkle, Crypto 1979) :B_picblock: + :PROPERTIES: + :BEAMER_opt: pic=merkle, leftpic=true, width=.7\linewidth + :BEAMER_env: picblock + :BEAMER_act: + :END: + Combination of + - tree + - hash function +***** Classical cryptographic construction + - fast, parallel signature of large data structures + - widely used (e.g., Git, blockchains, IPFS, ...) + - *built-in deduplication* +***** +** The archive in pictures + #+LATEX: \centering\forcebeamerstart + #+LATEX: \only<1>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_1.pdf}}} + #+LATEX: \only<2>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/contents.pdf}}} + #+LATEX: \only<3>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_2_contents.pdf}}} + #+LATEX: \only<4>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/directories.pdf}}} + #+LATEX: \only<5>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_3_directories.pdf}}} + #+LATEX: \only<6>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/revisions.pdf}}} + #+LATEX: \only<7>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_4_revisions.pdf}}} + #+LATEX: \only<8>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/releases.pdf}}} + #+LATEX: \only<9>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_5_releases.pdf}}} + #+LATEX: \only<10>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/snapshots.pdf}}} + #+LATEX: \forcebeamerend +* Under the hood: identifying billions of objects +** Our challenges in the PID landscape + :PROPERTIES: + :CUSTOM_ID: challenges + :END: +*** Typical properties of systems of identifiers + \hfill uniqueness, non ambiguity, persistence, abstraction (opacity) +#+BEAMER: \pause +*** Key needed properties from our use cases + - gratis :: identifiers are free (billions of objects) + - integrity :: the associated object cannot be changed (sw dev, /reproducibility/) + - no middle man :: no central authority is needed (sw dev, /reproducibility/) +#+BEAMER: \pause +*** + \hfill we could not find systems with both *integrity* and *no middle man* ! +** An important distinction: DIOs vs. IDOs + :PROPERTIES: + :CUSTOM_ID: diovsido + :END: +#+BEGIN_EXPORT latex + \begin{quote} + The term “Digital Object Identifier” is construed as “digital identifier of an object," rather than “identifier of a digital object” \hfill Norman Paskin. 2010 + \end{quote} +#+END_EXPORT +#+BEAMER: \pause +*** DIO (Digital Identifier of an Object) \hfill identifiers for (potentially) non digital objects + - epistemic complexity (manifestations, versions, locations, etc.) + - need an authority to ensure persistence and uniqueness +#+BEAMER: \pause +*** IDO (Identifier of a Digital Object) \hfill identifiers (only) for digital objects + - can provide both *integrity* and *no middle man* + - broadly used in modern software development (git, etc.) +#+BEAMER: \pause +*** IDOs and DIOs adress different needs + - for the core Software Heritage *IDOs are enough* + - we *must not* use DIOs for reproducibility +** The Software Heritage IDO schema \hfill (see *\url{http://bit.ly/swhpids}*) +#+BEGIN_EXPORT latex +\small +\begin{tcolorbox} +\href{https://archive.softwareheritage.org/swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2} +{swh:1:{\bf cnt}:94a9ed024d3859793618152ea559a168bbcbb5e2} \hfill full text of the GPL3 license +\end{tcolorbox} +\pause +\begin{tcolorbox} +\href{https://archive.softwareheritage.org/swh:1:dir:d198bc9d7a6bcf6db04f476d29314f157507d505} +{swh:1:{\bf dir}:d198bc9d7a6bcf6db04f476d29314f157507d505} \hfill Darktable source code +\end{tcolorbox} +\pause +\begin{tcolorbox} +\href{https://archive.softwareheritage.org/swh:1:rev:309cf2674ee7a0749978cf8265ab91a60aea0f7d} +{swh:1:{\bf rev}:309cf2674ee7a0749978cf8265ab91a60aea0f7d} +\end{tcolorbox} +\hfill a {\bf revision} in the development history of Darktable\\\pause +\begin{tcolorbox} +\href{https://archive.softwareheritage.org/swh:1:rel:22ece559cc7cc2364edc5e5593d63ae8bd229f9f} +{swh:1:{\bf rel}:22ece559cc7cc2364edc5e5593d63ae8bd229f9f} +\end{tcolorbox} +\hfill {\bf release} 2.3.0 of Darktable, dated 24 December 2016\\\pause +\begin{tcolorbox} +\href{https://archive.softwareheritage.org/swh:1:snp:c7c108084bc0bf3d81436bf980b46e98bd338453} +{swh:1:{\bf snp}:c7c108084bc0bf3d81436bf980b46e98bd338453} +\end{tcolorbox} +\hfill a {\bf snapshot} of the entire Darktable repository (4 May 2017, GitHub) +#+END_EXPORT +#+LATEX: \pause +*** + *Current resolvers:* \url{archive.softwareheritage.org} and \url{n2t.org} +* Demo time +** Demo time: let's highlight some features... +*** A "wayback machine" for software source code + - *\url{http://archive.softwareheritage.org/browse}* +*** Identification and sharing of billions of software artifacts + - *\url{http://bit.ly/swhpids}* for persistent identifiers +*** Depositing research software +# https://www.softwareheritage.org/2018/09/28/depositing-scientific-software-into-software-heritage/ + - *\url{http://bit.ly/swdepositblog}* +* A revolutionary infrastructure for Open Science +** Software Heritage for Open Science +*** Universal archive of software source code + - *all* software: + + research software, and... + + the bricks (re)used in research software +*** Uniform, intrinsic identifiers + - identifying *all* software source code artifacts + - no *middle man* needed, *integrity* included! +*** Wayback machine for software development + - tracks *all origins* that are archived + - an /Internet Archive for software source code/ +*** + \hfill Now part of the /French National Plan for Open Science/ \hfill\mbox{} +** Leveraging Software Heritage +*** Deposit research software \hfill /open since 9/2018/ :B_picblock: + :PROPERTIES: + :BEAMER_env: picblock + :BEAMER_OPT: pic=deposit-communication.png,width=.61\linewidth,leftpic=true + :END: +#+LATEX: \pause + *Generic mechanism (SWORD based):*\\ + - *review process*, versioning +# - /industry chimes in/ (details on demand) +#+BEAMER: \pause + - *(today)*: deposit .zip or .tar.gz file ([[http://bit.ly/swhdeposithalen][/guide/]]) + - *(tomorrow)*: provide /SWH id/ and (extract) metadata + \hfill [[https://www.softwareheritage.org/2018/09/28/depositing-scientific-software-into-software-heritage/][*click here to learn more...*]] +#+BEAMER: \pause +*** Reference archive: origins :B_block: + :PROPERTIES: + :BEAMER_env: block + :BEAMER_COL: .5 + :END: + *swMATH.org* links into Software Heritage + - e.g. [[http://swmath.org/software/7116][/the SemiPar entry in swMATH.org/]] +*** Reference archive: releases :B_block: + :PROPERTIES: + :BEAMER_env: block + :BEAMER_COL: .45 + :END: + *Wikidata* [[https://www.wikidata.org/wiki/Property:P6138][/SWH Release ID Property/]] + - e.g. [[https://www.wikidata.org/wiki/Q5533567][/the release 3.1.0 of Gensim/]] +*** :B_ignoreheading: + :PROPERTIES: + :BEAMER_env: ignoreheading + :END: +* Building for the long term +#+BEAMER: \transdissolve +** Raising Awareness +*** April 3rd 2017, Unesco Inria agreement :B_block: + :PROPERTIES: + :BEAMER_env: block + :BEAMER_COL: .5 + :END: + #+BEGIN_EXPORT latex + \includegraphics[width=\extblockscale{.85\linewidth}]{inria-logo-new}\hfill + \includegraphics[width=\extblockscale{.5\linewidth}]{unesco}\\[2.8em] + \includegraphics[width=\extblockscale{1.4\linewidth}]{unesco-accord}\\ + #+END_EXPORT +*** November 2018, Unesco Inria expert call :B_block: + :PROPERTIES: + :BEAMER_env: block + :BEAMER_COL: .5 + :END: + #+BEGIN_EXPORT latex + \hfill \includegraphics[width=\extblockscale{1.4\linewidth}]{UnescoParisCall} \hfill + #+END_EXPORT +** Growing Support + :PROPERTIES: + :CUSTOM_ID: support + :END: +*** Sharing the vision :B_block: + :PROPERTIES: + :CUSTOM_ID: endorsement + :BEAMER_COL: .5 + :BEAMER_env: block + :END: + #+LATEX: \begin{center}\vskip 1em \includegraphics[width=\extblockscale{1.4\linewidth}]{support.pdf}\end{center} +*** See more :noexport: + \hfill\tiny\url{http:://www.softwareheritage.org/support/testimonials} +*** Donors, members, sponsors :B_block: + :PROPERTIES: + :CUSTOM_ID: sponsors + :BEAMER_COL: .5 + :BEAMER_env: block + :END: + #+LATEX: \begin{center}\includegraphics[width=\extblockscale{.4\linewidth}]{inria-logo-new}\end{center} + #+LATEX: \begin{center} + #+LATEX: \includegraphics[width=\extblockscale{.2\linewidth}]{sponsors-levels.pdf} + #+LATEX: \includegraphics[width=\extblockscale{1.1\linewidth}]{sponsors.pdf} + #+LATEX: \end{center} +# - sponsoring / partnership :: \hfill \url{sponsorship.softwareheritage.org} +*** :B_ignoreheading: + :PROPERTIES: + :BEAMER_env: ignoreheading + :END: +*** Research collaboration :B_picblock: + :PROPERTIES: + :BEAMER_COL: .5 + :BEAMER_env: picblock + :BEAMER_OPT: pic=Qwant_Logo, leftpic=true + :END: + source code search engine +*** See more :noexport: + \hfill\tiny\url{http:://www.softwareheritage.org/support/testimonials} +*** Global network :B_picblock: + :PROPERTIES: + :BEAMER_COL: .5 + :BEAMER_env: picblock + :BEAMER_OPT: pic=fossid, leftpic=true, width=.3\linewidth + :END: + - first *independent mirror* + - increased reliability +** You can help! + #+BEAMER: \vspace{-1mm} +*** Reproducible Open Science + \emph{archive} research software in SWH\\ + \hfill \emph{reference} it using \emph{intrinsic identifiers}\hfill\mbox{}\\ + \hfill \emph{build} on top of SWH, \emph{do not rebuild SWH}! + #+BEAMER: \pause +*** + \hfill\huge *Reduce risk, avoid fragmentation*\hfill\mbox{} + #+BEAMER: \pause +*** Thomas Jefferson, February 18, 1791 :B_block: + :PROPERTIES: + :BEAMER_ACT: + :BEAMER_env: block + :BEAMER_COL: .51 + :END: +#+latex: {\em + ...let us save what remains: not by vaults and locks which fence them + from the public eye and use in consigning them to the waste of time, + but by such a multiplication of copies, as shall place them beyond + the reach of accident. +#+latex: } + #+BEAMER: \pause +*** A /common/ infrastructure :B_block: + :PROPERTIES: + :BEAMER_env: block + :BEAMER_COL: .45 + :END: + - *mutualisation* for sustainability + - open source, *non for profit* + - mirror network *open to all* + - *may* prevent a useless diaspora +* Conclusion +** Join the revolution! +#+BEGIN_EXPORT latex +% \begin{center} +% \includegraphics[width=.6\linewidth]{SWH-logo.pdf} +% \end{center} + \begin{center} + {\large \url{www.softwareheritage.org} \hspace{4em} \url{@swheritage}} + \end{center} +#+END_EXPORT +*** Library of Alexandria of code :B_picblock: + :PROPERTIES: + :BEAMER_env: picblock + :BEAMER_COL: 0.42 + :BEAMER_OPT: pic=clock-spring-forward.png,width=.45\linewidth,leftpic=true + :END: + - recover the past + - structure the future +*** A CERN for Software :B_picblock: + :PROPERTIES: + :BEAMER_env: picblock + :BEAMER_COL: 0.5 + :BEAMER_OPT: pic=atacama-telescope.jpg,width=.5\linewidth,leftpic=true + :END: + - build better software + + for industry + + for society as a whole + +*** :B_ignoreheading: + :PROPERTIES: + :BEAMER_env: ignoreheading + :END: +*** +# \hfill All together, step by step, we can make a positive change! \hfill +# #+INCLUDE: "../../common/modules/biblio.org::#main" :only-contents t + #+BEGIN_EXPORT latex + \begin{thebibliography}{Foo Bar, 1969} + \footnotesize + + \bibitem{Abramatic2018} Jean-François Abramatic, Roberto Di Cosmo, Stefano Zacchiroli\newblock + \href{https://cacm.acm.org/magazines/2018/10/231366-building-the-universal-archive-of-source-code/fulltext}{Building the Universal Archive of Source Code}\newblock + Communication of the ACM, October 2018 + + \bibitem{DiCosmo2018} Roberto Di Cosmo, Morane Gruenpeter, Stefano Zacchiroli\newblock + \href{https://hal.archives-ouvertes.fr/hal-01865790}{Identifiers for Digital Objects: the Case of Software Source Code Preservation}\newblock + iPRES 2018: Intl. Conf. on Digital Preservation + + \end{thebibliography} + #+END_EXPORT + +* Appendix :B_appendix: + :PROPERTIES: + :BEAMER_env: appendix + :END: +* Strategy +** All the source code + #+BEAMER: \begin{center}\includegraphics[width=\extblockscale{\linewidth}]{swh-collect-axes}\end{center} +** All the source code: strategy + #+BEAMER: \begin{center}\includegraphics[width=\extblockscale{\linewidth}]{swh-collect-strategies}\end{center} +* Data structure +** A bird's eye view + #+BEAMER: \begin{center} + #+BEAMER: \includegraphics[width=\extblockscale{1.3\textwidth}]{swh-merkle-dag-wide.pdf} + #+BEAMER: \end{center} +* Identifiers +** Limitations of DIOs + #+INCLUDE: "../../common/modules/doi-analysis.org::#doiexplained" :minlevel 3 :only-contents t +* Changing the status quo +** Revisiting the "Software Citation Principles" +*** Recommendation for identifiers + \hfill Use DIOs (/specifically, DOIs/) instead of IDOs (/specifically, git commit hashes/) +#+BEAMER: \pause +*** Original reasons to "avoid commit hashes" + 1. Version numbers/commit references /are not guaranteed to be permanent./ + 2. A repository address and version number /does not guarantee that the software is available/ [...] + 3. A particular version number/commit reference /may not represent a “preferred” point at which to cite the software/ [...] +#+BEAMER: \pause +*** Software Heritage changes all this + 1. SWH-IDs are permanent (and /do not depend on any particular VCS/) + 2. Software Heritage is /an archive, and guarantees availability/ + 3. At Software Heritage, we separate citation from reference +* Bad SOTA +** An example from my research field, Computer Science +#+INCLUDE: "../../common/modules/reprod-bad-sota.org::#collbergmethod" :only-contents t :minlevel 3 +** ... cont'd +#+INCLUDE: "../../common/modules/reprod-bad-sota.org::#collbergfindings" :only-contents t :minlevel 3 +*** The main reasons + \hfill source code (/or the right version of it/) cannot be found diff --git a/talks-public/2019-02-05-MaxPlankDL/Makefile b/talks-public/2019-02-05-MaxPlankDL/Makefile new file mode 100644 index 0000000..68fbee7 --- /dev/null +++ b/talks-public/2019-02-05-MaxPlankDL/Makefile @@ -0,0 +1 @@ +include ../Makefile.slides