diff --git a/talks-public/2019-02-05-MaxPlankDL/2019-02-05-MaxPlankDL.org b/talks-public/2019-02-05-MaxPlankDL/2019-02-05-MaxPlankDL.org index 076081a..b16efb2 100644 --- a/talks-public/2019-02-05-MaxPlankDL/2019-02-05-MaxPlankDL.org +++ b/talks-public/2019-02-05-MaxPlankDL/2019-02-05-MaxPlankDL.org @@ -1,533 +1,564 @@ #+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt) #+TITLE: Software Heritage #+SUBTITLE: A revolutionary infrastructure for Open Science #+BEAMER_HEADER: \title[Software Heritage for Open Science]{Software Heritage} #+AUTHOR: Roberto Di Cosmo #+EMAIL: roberto@dicosmo.org #+DATE: February 5th, 2019 #+BEAMER_HEADER: \date[February 5th, 2019]{February 5th, 2019\\ Open Science Days @ Max Plank Digital Library} #+KEYWORDS: software heritage legacy preservation knowledge mankind technology # # prelude.org contains all the information needed to export the main beamer latex source # use prelude-toc.org to get the table of contents # #+LATEX_HEADER: \usepackage{tcolorbox} #+INCLUDE: "../../common/modules/prelude-toc.org" :minlevel 1 #+INCLUDE: "../../common/modules/169.org" * Introductions # #+INCLUDE: "../../common/modules/rdc-bio.org::#main" :only-contents t :minlevel 2 ** Short Bio: Roberto Di Cosmo # +BEAMER: \raisebox{-.5\height}{\includegraphics[width=.28\linewidth]{rdc}} Computer Science professor in Paris, now working at INRIA\\ - /30 years/ of research (Theor. CS, Programming, Software Engineering, Erdos #: 3)\\ - /20 years/ of Free and Open Source Software\\ - /10 years/ building and directing structures for the common good\\ \mbox{}\\ \begin{minipage}[c]{0.18\linewidth} \includegraphics[width=1.0\linewidth]{rdc} \end{minipage} \begin{minipage}[c]{0.8\linewidth} \begin{description} % \item[1998] \emph{Cybersnare} -- voice of French FOSS \item[1999] \emph{DemoLinux} -- first live GNU/Linux distro % \item[2004] \emph{EDOS} -- check package dependencies \item[2007] \emph{Free Software Thematic Group}\\ %\tiny{\url{http://www.systematic-paris-region.org/fr/logiciel-libre}}\\ ~150 members ~40 projects ~200Me % \item[2008] \emph{Mancoosi project} \url{www.mancoosi.org} % \item[2010] \emph{IRILL} \url{www.irill.org} \item[2015] \emph{Software Heritage} at INRIA \item[2018] \emph{National Committee for Open Science}, France \end{description} \end{minipage} * Software source code: a pillar of Open Science ** The knowledge is in the source code! #+INCLUDE: "../../common/modules/source-code-different-short.org::#thesourcecode" :only-contents t :minlevel 3 ** Source code is /special/ #+INCLUDE: "../../common/modules/source-code-different-short.org::#softwareisdifferent" :only-contents t :minlevel 3 ** ~ 50 years, a lightning fast growth #+INCLUDE: "../../common/modules/50years-source-code.org::#apollolinux" :only-contents t :minlevel 3 ** The scientific method... #+INCLUDE: "../../common/modules/scientific-method.org::#short" :only-contents t :minlevel 3 ** ... evolves in the digital age! #+INCLUDE: "../../common/modules/reprod-digital-age.org::#reprod" :only-contents t :minlevel 3 ** Software Source code is an important pillar *** The Magic Triangle of Scientific Knowledge #+latex: \begin{center} #+ATTR_LATEX: :width \extblockscale{.7\linewidth} file:PreservationTriangle.png #+latex: \end{center} #+BEAMER: \pause *** Nota bene \hfill The links in the picture are *essential* * An inconvenient truth ** A /forgotten/ pillar of Open Science *** No reference catalog :PROPERTIES: :BEAMER_env: block :BEAMER_COL: .3 :END: #+BEGIN_EXPORT latex \begin{center} \includegraphics[width=.6\linewidth]{myriadsources} \end{center} #+END_EXPORT to find and reference *all* the source code #+BEAMER: \pause *** No universal archive :PROPERTIES: :BEAMER_env: block :BEAMER_COL: .3 :END: #+BEGIN_EXPORT latex \begin{center} \includegraphics[width=.6\linewidth]{fragilecloud} \end{center} #+END_EXPORT to preserve *all* the source code #+BEAMER: \pause *** No research infrastructure :B_block: :PROPERTIES: :BEAMER_COL: .3 :BEAMER_env: block :END: #+BEGIN_EXPORT latex \begin{center} \includegraphics[width=.7\linewidth]{atacama-telescope} \end{center} #+END_EXPORT to enable analysis of *all* the source code *** :B_ignoreheading: :PROPERTIES: :BEAMER_env: ignoreheading :END: #+BEAMER: \pause *** Lack of recognition :PROPERTIES: :BEAMER_env: block :BEAMER_COL: .5 :END: not (yet) a first class citizen - in the EOSC plan - in the EU copyright reform - in the scholarly works #+BEAMER: \pause *** Lack of established guidance :PROPERTIES: :BEAMER_env: block :BEAMER_COL: .5 :END: - choose a license - cite a software project - make source code available - relate to industry best practices # #+BEAMER: \pause # *** :B_ignoreheading: # :PROPERTIES: # :BEAMER_env: ignoreheading # :END: # *** Lack of basic prerequisites to reproducibility # See a discussion in \url{annex.softwareheritage.org/talks/2018/2018-09-17-STScI_public.pdf} ** No catalog, no archive, no references: we are at a turning point #+INCLUDE: "../../common/modules/turningpoint.org::#turningpoint" :only-contents t :minlevel 5 -* Software Heritage: a revolutionary infrastructure +* Software Heritage ** Software Heritage, in a nutshell #+BEAMER: \transdissolve # #+INCLUDE: "../../common/modules/swh-goals-oneslide-vertical.org::#goals" :only-contents t :minlevel 3 #+latex: \begin{center} #+ATTR_LATEX: :width \extblockscale{.8\linewidth} file:SWH-logo+motto.pdf #+latex: \end{center} *** /Collect, preserve and share/ the /source code/ of /all the software/ \hfill Preserving our heritage, enabling better software and better science for all #+BEAMER: \pause # *** :B_ignoreheading: :PROPERTIES: :BEAMER_env: ignoreheading :END: #+latex: \begin{center} #+ATTR_LATEX: :width 0.8\linewidth file:SWH-as-foundation-slim.png #+latex: \end{center} #+BEAMER: \pause *** Technology :PROPERTIES: :BEAMER_col: 0.34 :BEAMER_env: block :END: - transparency and *FOSS* - *replicas* all the way down *** Content :PROPERTIES: :BEAMER_col: 0.32 :BEAMER_env: block :END: - *intrinsic identifiers* - facts and *provenance* *** Organization :PROPERTIES: :BEAMER_col: 0.33 :BEAMER_env: block :END: - *non-profit* - multi-stakeholder # * Status ** Coverage #+INCLUDE: "../../common/modules/status-extended.org::#archive" :only-contents t :minlevel 3 * Under the hood: architecture and data structure ** Automation, and storage #+BEAMER: \begin{center} #+BEAMER: \mode{\only<1>{\includegraphics[width=\extblockscale{1.1\textwidth}]{swh-dataflow-merkle-listers.pdf}}} #+BEAMER: \only<2-3>{\includegraphics[width=\extblockscale{1.1\textwidth}]{swh-dataflow-merkle.pdf}} #+BEAMER: \end{center} #+BEAMER: \pause #+BEAMER: \pause Full development history *permanently archived* in a *uniform data model*. ** Much more than an archive! # R. C. Merkle, A digital signature based on a conventional encryption # function, Crypto '87 #+BEAMER: \vspace{-3mm} ***** Merkle tree (R. C. Merkle, Crypto 1979) :B_picblock: :PROPERTIES: :BEAMER_opt: pic=merkle, leftpic=true, width=.7\linewidth :BEAMER_env: picblock :BEAMER_act: :END: Combination of - tree - hash function ***** Classical cryptographic construction - fast, parallel signature of large data structures - widely used (e.g., Git, blockchains, IPFS, ...) - *built-in deduplication* ***** ** The archive in pictures #+LATEX: \centering\forcebeamerstart #+LATEX: \only<1>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_1.pdf}}} #+LATEX: \only<2>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/contents.pdf}}} #+LATEX: \only<3>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_2_contents.pdf}}} #+LATEX: \only<4>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/directories.pdf}}} #+LATEX: \only<5>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_3_directories.pdf}}} #+LATEX: \only<6>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/revisions.pdf}}} #+LATEX: \only<7>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_4_revisions.pdf}}} #+LATEX: \only<8>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/releases.pdf}}} #+LATEX: \only<9>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_5_releases.pdf}}} #+LATEX: \only<10>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/snapshots.pdf}}} #+LATEX: \forcebeamerend * Under the hood: identifying billions of objects ** Our challenges in the PID landscape :PROPERTIES: :CUSTOM_ID: challenges :END: *** Typical properties of systems of identifiers \hfill uniqueness, non ambiguity, persistence, abstraction (opacity) #+BEAMER: \pause *** Key needed properties from our use cases - gratis :: identifiers are free (billions of objects) - integrity :: the associated object cannot be changed (sw dev, /reproducibility/) - no middle man :: no central authority is needed (sw dev, /reproducibility/) #+BEAMER: \pause *** \hfill we could not find systems with both *integrity* and *no middle man* ! ** An important distinction: DIOs vs. IDOs :PROPERTIES: :CUSTOM_ID: diovsido :END: #+BEGIN_EXPORT latex \begin{quote} The term “Digital Object Identifier” is construed as “digital identifier of an object," rather than “identifier of a digital object” \hfill Norman Paskin. 2010 \end{quote} #+END_EXPORT #+BEAMER: \pause *** DIO (Digital Identifier of an Object) \hfill identifiers for (potentially) non digital objects - epistemic complexity (manifestations, versions, locations, etc.) - need an authority to ensure persistence and uniqueness #+BEAMER: \pause *** IDO (Identifier of a Digital Object) \hfill identifiers (only) for digital objects - can provide both *integrity* and *no middle man* - broadly used in modern software development (git, etc.) #+BEAMER: \pause *** IDOs and DIOs adress different needs - for the core Software Heritage *IDOs are enough* - we *must not* use DIOs for reproducibility ** The Software Heritage IDO schema \hfill (see *\url{http://bit.ly/swhpids}*) #+BEGIN_EXPORT latex \small \begin{tcolorbox} \href{https://archive.softwareheritage.org/swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2} {swh:1:{\bf cnt}:94a9ed024d3859793618152ea559a168bbcbb5e2} \hfill full text of the GPL3 license \end{tcolorbox} \pause \begin{tcolorbox} \href{https://archive.softwareheritage.org/swh:1:dir:d198bc9d7a6bcf6db04f476d29314f157507d505} {swh:1:{\bf dir}:d198bc9d7a6bcf6db04f476d29314f157507d505} \hfill Darktable source code \end{tcolorbox} \pause \begin{tcolorbox} \href{https://archive.softwareheritage.org/swh:1:rev:309cf2674ee7a0749978cf8265ab91a60aea0f7d} {swh:1:{\bf rev}:309cf2674ee7a0749978cf8265ab91a60aea0f7d} \end{tcolorbox} \hfill a {\bf revision} in the development history of Darktable\\\pause \begin{tcolorbox} \href{https://archive.softwareheritage.org/swh:1:rel:22ece559cc7cc2364edc5e5593d63ae8bd229f9f} {swh:1:{\bf rel}:22ece559cc7cc2364edc5e5593d63ae8bd229f9f} \end{tcolorbox} \hfill {\bf release} 2.3.0 of Darktable, dated 24 December 2016\\\pause \begin{tcolorbox} \href{https://archive.softwareheritage.org/swh:1:snp:c7c108084bc0bf3d81436bf980b46e98bd338453} {swh:1:{\bf snp}:c7c108084bc0bf3d81436bf980b46e98bd338453} \end{tcolorbox} \hfill a {\bf snapshot} of the entire Darktable repository (4 May 2017, GitHub) #+END_EXPORT #+LATEX: \pause *** *Current resolvers:* \url{archive.softwareheritage.org} and \url{n2t.org} * Demo time ** Demo time: let's highlight some features... *** A "wayback machine" for software source code - *\url{http://archive.softwareheritage.org/browse}* *** Identification and sharing of billions of software artifacts - *\url{http://bit.ly/swhpids}* for persistent identifiers *** Depositing research software # https://www.softwareheritage.org/2018/09/28/depositing-scientific-software-into-software-heritage/ - *\url{http://bit.ly/swdepositblog}* * A revolutionary infrastructure for Open Science ** Software Heritage for Open Science *** Universal archive of software source code - *all* software: + research software, and... + the bricks (re)used in research software *** Uniform, intrinsic identifiers - identifying *all* software source code artifacts - no *middle man* needed, *integrity* included! *** Wayback machine for software development - tracks *all origins* that are archived - an /Internet Archive for software source code/ *** \hfill Now part of the /French National Plan for Open Science/ \hfill\mbox{} ** Leveraging Software Heritage *** Deposit research software \hfill /open since 9/2018/ :B_picblock: :PROPERTIES: :BEAMER_env: picblock :BEAMER_OPT: pic=deposit-communication.png,width=.61\linewidth,leftpic=true :END: #+LATEX: \pause *Generic mechanism (SWORD based):*\\ - *review process*, versioning # - /industry chimes in/ (details on demand) #+BEAMER: \pause - *(today)*: deposit .zip or .tar.gz file ([[http://bit.ly/swhdeposithalen][/guide/]]) - *(tomorrow)*: provide /SWH id/ and (extract) metadata \hfill [[https://www.softwareheritage.org/2018/09/28/depositing-scientific-software-into-software-heritage/][*click here to learn more...*]] #+BEAMER: \pause *** Reference archive: origins :B_block: :PROPERTIES: :BEAMER_env: block :BEAMER_COL: .5 :END: *swMATH.org* links into Software Heritage - e.g. [[http://swmath.org/software/7116][/the SemiPar entry in swMATH.org/]] *** Reference archive: releases :B_block: :PROPERTIES: :BEAMER_env: block :BEAMER_COL: .45 :END: *Wikidata* [[https://www.wikidata.org/wiki/Property:P6138][/SWH Release ID Property/]] - e.g. [[https://www.wikidata.org/wiki/Q5533567][/the release 3.1.0 of Gensim/]] *** :B_ignoreheading: :PROPERTIES: :BEAMER_env: ignoreheading :END: * Building for the long term #+BEAMER: \transdissolve ** Raising Awareness *** April 3rd 2017, Unesco Inria agreement :B_block: :PROPERTIES: :BEAMER_env: block :BEAMER_COL: .5 :END: #+BEGIN_EXPORT latex \includegraphics[width=\extblockscale{.85\linewidth}]{inria-logo-new}\hfill \includegraphics[width=\extblockscale{.5\linewidth}]{unesco}\\[2.8em] \includegraphics[width=\extblockscale{1.4\linewidth}]{unesco-accord}\\ #+END_EXPORT *** November 2018, Unesco Inria expert call :B_block: :PROPERTIES: :BEAMER_env: block :BEAMER_COL: .5 :END: #+BEGIN_EXPORT latex \hfill \includegraphics[width=\extblockscale{1.4\linewidth}]{UnescoParisCall} \hfill #+END_EXPORT ** Growing Support :PROPERTIES: :CUSTOM_ID: support :END: *** Sharing the vision :B_block: :PROPERTIES: :CUSTOM_ID: endorsement :BEAMER_COL: .5 :BEAMER_env: block :END: #+LATEX: \begin{center}\vskip 1em \includegraphics[width=\extblockscale{1.4\linewidth}]{support.pdf}\end{center} *** See more :noexport: \hfill\tiny\url{http:://www.softwareheritage.org/support/testimonials} *** Donors, members, sponsors :B_block: :PROPERTIES: :CUSTOM_ID: sponsors :BEAMER_COL: .5 :BEAMER_env: block :END: #+LATEX: \begin{center}\includegraphics[width=\extblockscale{.4\linewidth}]{inria-logo-new}\end{center} #+LATEX: \begin{center} #+LATEX: \includegraphics[width=\extblockscale{.2\linewidth}]{sponsors-levels.pdf} #+LATEX: \includegraphics[width=\extblockscale{1.1\linewidth}]{sponsors.pdf} #+LATEX: \end{center} # - sponsoring / partnership :: \hfill \url{sponsorship.softwareheritage.org} *** :B_ignoreheading: :PROPERTIES: :BEAMER_env: ignoreheading :END: *** Research collaboration :B_picblock: :PROPERTIES: :BEAMER_COL: .5 :BEAMER_env: picblock :BEAMER_OPT: pic=Qwant_Logo, leftpic=true :END: source code search engine *** See more :noexport: \hfill\tiny\url{http:://www.softwareheritage.org/support/testimonials} *** Global network :B_picblock: :PROPERTIES: :BEAMER_COL: .5 :BEAMER_env: picblock :BEAMER_OPT: pic=fossid, leftpic=true, width=.3\linewidth :END: - first *independent mirror* - increased reliability ** You can help! #+BEAMER: \vspace{-1mm} -*** Reproducible Open Science +*** Towards Reproducible Open Science \emph{archive} research software in SWH\\ \hfill \emph{reference} it using \emph{intrinsic identifiers}\hfill\mbox{}\\ - \hfill \emph{build} on top of SWH, \emph{do not rebuild SWH}! + \hfill \emph{build} on top of SWH, \emph{do not try to rebuild SWH}! #+BEAMER: \pause *** - \hfill\huge *Reduce risk, avoid fragmentation*\hfill\mbox{} + :PROPERTIES: + :BEAMER_ACT: + :BEAMER_env: block + :BEAMER_COL: .51 + :END: + \hfill\Large *reduce risk*\hfill\mbox{} +*** + :PROPERTIES: + :BEAMER_env: block + :BEAMER_act: <4-> + :BEAMER_COL: .45 + :END: + \hfill\Large *avoid fragmentation!*\hfill\mbox{} #+BEAMER: \pause +# *** +# \hfill\huge *Reduce risk, avoid fragmentation*\hfill\mbox{} +# #+BEAMER: \pause +*** :B_ignoreheading: + :PROPERTIES: + :BEAMER_env: ignoreheading + :END: *** Thomas Jefferson, February 18, 1791 :B_block: :PROPERTIES: :BEAMER_ACT: :BEAMER_env: block :BEAMER_COL: .51 :END: #+latex: {\em ...let us save what remains: not by vaults and locks which fence them from the public eye and use in consigning them to the waste of time, but by such a multiplication of copies, as shall place them beyond the reach of accident. #+latex: } #+BEAMER: \pause *** A /common/ infrastructure :B_block: :PROPERTIES: :BEAMER_env: block + :BEAMER_act: <5-> :BEAMER_COL: .45 :END: - *mutualisation* for sustainability - open source, *non for profit* - mirror network *open to all* - *may* prevent a useless diaspora +** Part of the institutional mission +*** Alliance of German Science Organisations \hfill 2016 +Recommendations on the Development, Use and Provision of Research Software +#+latex: \mbox{}\\[1em] +#+latex: \begin{quote} + Data centres should ... facilitate *close links to established infrastructures and services for the publication of source code* and research software. +#+latex: \mbox{}\\[1em] + Libraries, in particular, should set up advisory services on the licensing, *persistent referencing* and citation of research software. +#+latex: \mbox{}\\[1em] +#+latex: \end{quote} + \hfill * Conclusion ** Join the revolution! #+BEGIN_EXPORT latex % \begin{center} % \includegraphics[width=.6\linewidth]{SWH-logo.pdf} % \end{center} \begin{center} {\large \url{www.softwareheritage.org} \hspace{4em} \url{@swheritage}} \end{center} #+END_EXPORT *** Library of Alexandria of code :B_picblock: :PROPERTIES: :BEAMER_env: picblock :BEAMER_COL: 0.42 :BEAMER_OPT: pic=clock-spring-forward.png,width=.45\linewidth,leftpic=true :END: - recover the past - structure the future *** A CERN for Software :B_picblock: :PROPERTIES: :BEAMER_env: picblock :BEAMER_COL: 0.5 :BEAMER_OPT: pic=atacama-telescope.jpg,width=.5\linewidth,leftpic=true :END: - build better software + for industry + for society as a whole *** :B_ignoreheading: :PROPERTIES: :BEAMER_env: ignoreheading :END: *** # \hfill All together, step by step, we can make a positive change! \hfill # #+INCLUDE: "../../common/modules/biblio.org::#main" :only-contents t #+BEGIN_EXPORT latex \begin{thebibliography}{Foo Bar, 1969} \footnotesize \bibitem{Abramatic2018} Jean-François Abramatic, Roberto Di Cosmo, Stefano Zacchiroli\newblock \href{https://cacm.acm.org/magazines/2018/10/231366-building-the-universal-archive-of-source-code/fulltext}{Building the Universal Archive of Source Code}\newblock Communication of the ACM, October 2018 \bibitem{DiCosmo2018} Roberto Di Cosmo, Morane Gruenpeter, Stefano Zacchiroli\newblock \href{https://hal.archives-ouvertes.fr/hal-01865790}{Identifiers for Digital Objects: the Case of Software Source Code Preservation}\newblock iPRES 2018: Intl. Conf. on Digital Preservation \end{thebibliography} #+END_EXPORT * Appendix :B_appendix: :PROPERTIES: :BEAMER_env: appendix :END: * Strategy ** All the source code #+BEAMER: \begin{center}\includegraphics[width=\extblockscale{\linewidth}]{swh-collect-axes}\end{center} ** All the source code: strategy #+BEAMER: \begin{center}\includegraphics[width=\extblockscale{\linewidth}]{swh-collect-strategies}\end{center} * Data structure ** A bird's eye view #+BEAMER: \begin{center} #+BEAMER: \includegraphics[width=\extblockscale{1.3\textwidth}]{swh-merkle-dag-wide.pdf} #+BEAMER: \end{center} * Identifiers ** Limitations of DIOs #+INCLUDE: "../../common/modules/doi-analysis.org::#doiexplained" :minlevel 3 :only-contents t * Changing the status quo ** Revisiting the "Software Citation Principles" *** Recommendation for identifiers \hfill Use DIOs (/specifically, DOIs/) instead of IDOs (/specifically, git commit hashes/) #+BEAMER: \pause *** Original reasons to "avoid commit hashes" 1. Version numbers/commit references /are not guaranteed to be permanent./ 2. A repository address and version number /does not guarantee that the software is available/ [...] 3. A particular version number/commit reference /may not represent a “preferred” point at which to cite the software/ [...] #+BEAMER: \pause *** Software Heritage changes all this 1. SWH-IDs are permanent (and /do not depend on any particular VCS/) 2. Software Heritage is /an archive, and guarantees availability/ 3. At Software Heritage, we separate citation from reference * Bad SOTA ** An example from my research field, Computer Science #+INCLUDE: "../../common/modules/reprod-bad-sota.org::#collbergmethod" :only-contents t :minlevel 3 ** ... cont'd #+INCLUDE: "../../common/modules/reprod-bad-sota.org::#collbergfindings" :only-contents t :minlevel 3 *** The main reasons \hfill source code (/or the right version of it/) cannot be found