diff --git a/talks-public/2021-01-11-RRPR/2021-01-11-RRPR.org b/talks-public/2021-01-11-RRPR/2021-01-11-RRPR.org index ca193d7..43e6f88 100644 --- a/talks-public/2021-01-11-RRPR/2021-01-11-RRPR.org +++ b/talks-public/2021-01-11-RRPR/2021-01-11-RRPR.org @@ -1,432 +1,351 @@ #+TITLE: Archiving and referencing all the software source code #+SUBTITLE: a foundation for reproducibility #+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt) #+BEAMER_HEADER: \title[Preserving source code in Software Heritage]{Preserving source code in Software Heritage} #+AUTHOR: Roberto Di Cosmo #+EMAIL: roberto@dicosmo.org #+BEAMER_HEADER: \date[11/01/2021]{RRPR 2020} #+BEAMER_HEADER: \author[@rdicosmo~~~~Roberto Di Cosmo~~~~@swheritage~~~~~CC-BY 4.0]{Roberto Di Cosmo\\Director, Software Heritage} #+KEYWORDS: software, heritage, legacy, preservation, knowledge, mankind, technology, prospective, software development # # prelude.org contains all the information needed to export the main beamer latex source # use prelude-toc.org to get the table of contents # #+LATEX_HEADER: \usepackage{tcolorbox} #+LATEX_HEADER: \definecolor{links}{HTML}{2A1B81} #+LATEX_HEADER: \hypersetup{colorlinks,linkcolor=,urlcolor=links} #+INCLUDE: "../../common/modules/prelude-toc.org" :minlevel 1 #+INCLUDE: "../../common/modules/169.org" * Introduction #+INCLUDE: "../../common/modules/rdc-bio.org::#main" :only-contents t :minlevel 2 * Software Source Code is knowledge ** Software source code: /human readable/ and /executable knowledge/ :PROPERTIES: :CUSTOM_ID: softwareisdifferent :END: *** Harold Abelson, Structure and Interpretation of Computer Programs \hfill (1985) /“Programs must be written for people to read, and only incidentally for machines to execute.”/ #+BEAMER: \pause *** Apollo 11 source code ([[https://archive.softwareheritage.org/swh:1:cnt:0c1741c1fb0150f111625d02277407f628c31bac;origin=https://github.com/virtualagc/virtualagc;visit=swh:1:snp:cdcd2bc43331a436e8c659ba93175ef7d7eb339b;anchor=swh:1:rev:4e5d304eb7cd5589b924ffb8b423b6f15511b35d;path=/Luminary116/THE_LUNAR_LANDING.agc;lines=244-260/][excerpt]]) :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.48 :BEAMER_env: block :END: #+LATEX: \includegraphics[width=\linewidth]{apollo-11-cranksilly.png} # excerpt of routine that asks astronaut to turn around the LEM #+BEAMER: \pause *** Quake III source code ([[https://archive.softwareheritage.org/swh:1:cnt:bb0faf6919fc60636b2696f32ec9b3c2adb247fe;origin=https://github.com/id-Software/Quake-III-Arena;visit=swh:1:snp:687ac8cdbfab3b78b7f301abee5f451127f135fc;anchor=swh:1:rev:dbe4ddb10315479fc00086f08e25d968b4b43c49;path=/;lines=549-572/][excerpt]]) :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.45 :BEAMER_env: block :END: #+LATEX: \includegraphics[width=\linewidth]{quake-carmack-sqrt-1.png} # smart efficient implementation of 1/sqrt(x) on a CPU without special support #+BEAMER: \pause *** :B_ignoreheading: :PROPERTIES: :BEAMER_env: ignoreheading :END: *** Len Shustek, Computer History Museum \hfill (2006) \hfill /“Source code provides a view into the mind of the designer.”/ *** Distinguishing features :noexport: - /executable/ and /human readable/ knowledge (an /all time new/) + even hardware is... software! (VHDL, FPGA, ...) + /text files are forever/ - naturally /evolves/ over time + the /development history/ is key to its /understanding/ - complex: large /web of dependencies/, millions of SLOCs *** In a word :noexport: - software /is not just another/ sequence of bits - a software archive /is not just another/ digital archive ** Software Source code: pillar of Open Science *** Three pillars of Open Science :B_block: :PROPERTIES: :BEAMER_env: block :BEAMER_COL: .4 :END: #+latex: \begin{center} #+ATTR_LATEX: :width \extblockscale{1.2\linewidth} file:PreservationTriangle.png #+latex: \end{center} #+BEAMER: \pause *** A plurality of needs :B_block: :PROPERTIES: :BEAMER_env: block :BEAMER_COL: .6 :END: - Researcher :: - *archive* and *reference* software used in articles - *find* useful software - get *credit* for developed software - verify/reproduce/improve results #+BEAMER: \pause - Laboratory/team :: track software contributions - produce reports / web page #+BEAMER: \pause - Research Organization :: know its *software assets* - technology *transfer* - impact *metrics* ** What is at stake \hfill in increasing order of difficulty #+INCLUDE: "../../common/modules/swh-ardc.org::#ardc" :only-contents t :minlevel 3 *** :B_ignoreheading: :PROPERTIES: :BEAMER_env: ignoreheading :END: \vspace{-.5em} #+BEAMER: \pause **** We need an infrastructure /designed for/ software source code \pause \hfill /now we have it!/ * Software Heritage ** Software Heritage, in a nutshell #+BEAMER: \transdissolve #+INCLUDE: "../../common/modules/swh-goals-oneslide-vertical.org::#goals" :only-contents t :minlevel 3 ** The largest software archive, a shared infrastructure #+latex: \begin{center} #+ATTR_LATEX: :width 0.7\linewidth file:SWH-as-foundation-slim.png #+latex: \end{center} #+BEAMER: \pause #+latex: \centering #+ATTR_LATEX: :width \extblockscale{.9\linewidth} file:2021-01-archive-growth.png ** Addressing the four ARDC needs (see [[https://dx.doi.org/10.1007/978-3-030-52200-1_36][ICMS 2020]] for details) #+INCLUDE: "../../common/modules/swh-ardc.org::#swh-ardc-short" :only-contents t :minlevel 3 * Demo time! ** A walkthrough - Browse [[https://archive.softwareheritage.org][the archive]] - [[https://save.softwareheritage.org][Trigger archival]] of your preferred software in a breeze - Get and use SWHIDs ([[https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html][full specification available online]]) - Cite software [[https://www.softwareheritage.org/2020/05/26/citing-software-with-style/][with the biblatex-software style]] from CTAN - Example use in a research article: compare Fig. 1 and conclusions - in [[http://www.dicosmo.org/Articles/2012-DaneluttoDiCosmo-Pcs.pdf][the 2012 version]] - in [[https://www.dicosmo.org/share/parmap_swh.pdf][the updated version]] using SWHIDs and Software Heritage - Example use in a research article: extensive use of SWHIDs in [[https://www.dicosmo.org/Articles/2020-ReScienceC.pdf][a replication experiment]] - [[https://doc.archives-ouvertes.fr/en/deposit/deposit-software-source-code/][Curated deposit in SWH via HAL]], see for example: [[https://hal.archives-ouvertes.fr/hal-02130801][LinBox]], [[https://hal.archives-ouvertes.fr/hal-01897934][SLALOM]], [[https://hal.archives-ouvertes.fr/hal-02130729][Givaro]], [[https://hal.archives-ouvertes.fr/hal-02137040][NS2DDV]], [[https://hal.archives-ouvertes.fr/lirmm-02136558][SumGra]], [[https://hal.archives-ouvertes.fr/hal-02155786][Coq proof]], ... - Rescue landmark legacy software, see the [[https://www.softwareheritage.org/swhap/][SWHAP process with UNESCO]] * The way forward ** An international, non profit initiative\hfill built for the long term #+INCLUDE: "../../common/modules/support+sponsors.org::#support+sponsors" :only-contents t :minlevel 3 ** Adoption is coming ... #+INCLUDE: "../../common/modules/swh-adoption-academic.org::#adoption" :only-contents t :minlevel 3 ** Breaking news, and a lesson to be learned *** Saving 250.000 endangered repositories... - summer 2019: BitBucket announce Mercurial VCS phase out - fall 2019: Software Heritage teams up with Octobus (funded by NLNet, thanks!) - july 2020: BitBucket erases /250.000/ repositories - august 2020: [[https://bitbucket-archive.softwareheritage.org][bitbucket-archive.softwareheritage.org]] is live #+BEAMER: \pause *** ... preserving the web of knowledge \hfill (Tweet [[https://twitter.com/gabrielaltay/status/1300218789762662401][is here]] ) :B_picblock: :PROPERTIES: :BEAMER_env: picblock :BEAMER_OPT: pic=bitbucket_swh_praise.png, width=.6\linewidth, leftpic=true :END: \\ *Bottomline*\\ /explicit deposit/ is important, ...\\ \mbox{}\hfill ... and we must promote it...\hfill\mbox{}\\ \mbox{}\hfill ... but will never be enough.\\ \mbox{}\\ \mbox{}\hfill /(think also of all software dependencies!)/ ** Conclusion \vspace{-.5em} *** Software Heritage :B_block: :PROPERTIES: :BEAMER_env: block :BEAMER_COL: .45 :END: - /universal/ archive of source code - /intrinsic/ identifiers (SWHIDS) - /open/, /non profit/, long term - /infrastructure/ for Open Science #+BEAMER: \pause *** You can help improve science! :B_block: :PROPERTIES: :BEAMER_env: block :BEAMER_COL: .55 :END: - /adopt/ SWH: conferences, journals, AEC - /save/ relevant source code - /contribute/ to SWH: /it is open source/ - help build the SWH /community/ :PROPERTIES: :BEAMER_env: ignoreheading :END: #+BEAMER: \pause *** :B_ignoreheading: :PROPERTIES: :BEAMER_env: ignoreheading :END: #+BEGIN_EXPORT latex \begin{thebibliography}{Foo Bar, 1969} \footnotesize \bibitem{DiCosmo2020d} Roberto Di Cosmo \newblock Archiving and Referencing Source Code with Software Heritage \newblock International Congress on Mathematical Software (ICMS), 2020 \bibitem{Abramatic2018} Jean-François Abramatic, Roberto Di Cosmo, Stefano Zacchiroli\newblock \emph{Building the Universal Archive of Source Code}, CACM, October 2018 \href{https://doi.org/10.1145/3183558}{(10.1145/3183558)} \bibitem{alliez:hal-02135891} P. Alliez, R. Di Cosmo, B. Guedj, A. Girault, M.-S. Hacid, A. Legrand and N. Rougier\newblock \emph{Attributing and referencing (research) software: Best practices and outlook from Inria}, \newblock CiSE 2020 \href{https://doi.ieeecomputersociety.org/10.1109/MCSE.2019.2949413}{(10.1109/MCSE.2019.2949413)} \href{https://hal.archives-ouvertes.fr/hal-02135891}{(hal-02135891)} \bibitem{OCamlP3l} Roberto Di Cosmo, Marco Danelutto\newblock \emph{[Rp] Reproducing and replicating the OCamlP3l experiment}. ReScience C, 6(1), 2. \end{thebibliography} #+END_EXPORT * Appendix :B_appendix: :PROPERTIES: :BEAMER_env: appendix :END: ** \vfill \centerline{\Huge Appendix} \vfill ** Mirrors *** Thomas Jefferson, February 18, 1791 :B_block: :PROPERTIES: :BEAMER_ACT: :BEAMER_env: block :END: #+latex: {\em ...let us save what remains: not by vaults and locks which fence them from the public eye and use in consigning them to the waste of time, but by such a multiplication of copies, as shall place them beyond the reach of accident. #+latex: } #+BEAMER: \pause *** Welcoming ENEA :B_block: :PROPERTIES: :BEAMER_env: picblock :BEAMER_OPT: pic=LogoENEAcompletoENG.png, leftpic=true, width=.7\linewidth :END: - first *institutional* mirror - increased resilience - *AI infrastructure* for researchers - stepping stone to \endgraf \hfill an European joint effort ** A revolutionary infrastructure for software source code #+BEAMER: \vspace{-2mm} *** The /graph/ of Software Development :B_picblock: :PROPERTIES: :BEAMER_env: picblock :BEAMER_COL: .5 :BEAMER_OPT: pic=git-merkle/merkle-vertical, leftpic=true, width=.4\linewidth :END: All software development with its history, in *a single graph* ... #+BEAMER: \pause \vspace{-2mm} *** Reference platform for /Big Code/ :B_picblock: :PROPERTIES: :BEAMER_opt: pic=universal, leftpic=true, width=.4\linewidth :BEAMER_env: picblock :BEAMER_COL: .5 :BEAMER_act: :END: *One uniform data structure* enables /massive/ machine learning for *quality, cybersecurity*, etc. *** :B_ignoreheading: :PROPERTIES: :BEAMER_env: ignoreheading :END: #+BEAMER: \pause *** First datasets are available! - full graph of software development (~20Bn nodes, ~200Bn edges) see Pietri et al., MSR 2019 https://dx.doi.org/10.1109/MSR.2019.00030 - MSR 2020 mining competition ** Automation, and storage :maybemove: #+BEAMER: \begin{center} #+BEAMER: \mode{\only<1>{\includegraphics[width=\extblockscale{.9\textwidth}]{swh-dataflow-merkle-listers.pdf}}} #+BEAMER: \only<2-3>{\includegraphics[width=\extblockscale{.9\textwidth}]{swh-dataflow-merkle.pdf}} #+BEAMER: \end{center} #+BEAMER: \pause #+BEAMER: \pause - full development history *permanently archived!* - over 8 billions unique source files from 140+ million origins ** Archiving /public/ code #+latex: \begin{center} #+ATTR_LATEX: :width 0.7\linewidth file:codeetalab.png #+latex: \end{center} #+BEAMER: \pause https://code.etalab.gouv.fr ** ENEA mirror *** Thomas Jefferson, February 18, 1791 :B_block: :PROPERTIES: :BEAMER_ACT: :BEAMER_env: block :END: #+latex: {\em ...let us save what remains: not by vaults and locks which fence them from the public eye and use in consigning them to the waste of time, but by such a multiplication of copies, as shall place them beyond the reach of accident. #+latex: } #+BEAMER: \pause *** Welcoming ENEA :B_block: :PROPERTIES: :BEAMER_env: picblock :BEAMER_OPT: pic=LogoENEAcompletoENG.png, leftpic=true, width=.7\linewidth :END: - first *institutional* mirror - increased resilience - *AI infrastructure* for researchers - stepping stone to \endgraf \hfill an European joint effort ** The Software Heritage Acquisition Process (SWHAP) *** Paris Call on Software Source Code “[We call to] support efforts to gather and preserve the artifacts and narratives of the history of computing, while the earlier creators are still alive” #+BEAMER: \pause *** SWHAP : an important step forward - detailed guidelines to *curate* landmark legacy source code and *archive* it on Software Heritage - intense cooperation with *Università di Pisa* and *UNESCO* - open to all, we'll promote it worldwide *** https://www.softwareheritage.org/swhap -* A bit of tech info -** A peek under the hood :maybemove: - #+BEAMER: \begin{center} - #+BEAMER: \mode{\only<1>{\includegraphics[width=\extblockscale{1\textwidth}]{swh-dataflow-merkle-listers.pdf}}} - #+BEAMER: \only<2-3>{\includegraphics[width=\extblockscale{1\textwidth}]{swh-dataflow-merkle.pdf}} - #+BEAMER: \end{center} -#+BEAMER: \pause -#+BEAMER: \pause - /Global development history/ permanently archived in a /unique/ git-like Merkle DAG - - *~400 TB* (uncompressed) blobs, *~20 B* nodes, *~280 B* edges - # - *GitHub*, Gitlab.com, Bitbucket, /Gitorious/, /GoogleCode/, GNU, PyPi, Debian, NPM... * The SWH-ID: the source code fingerprint :noexport: ** Software Heritage Identifiers (SWHID) \hfill [[https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html][link to full docs]] # #+INCLUDE: "../../common/modules/swh-id-syntax.org::#swh-id-syntax" :only-contents t :minlevel 3 #+LATEX: \centering%\forcebeamerstart #+LATEX: \mode{\only<1>{\includegraphics[width=\linewidth]{SWHID-v1.4_1.png}}} #+LATEX: \mode{\only<2>{\includegraphics[width=\linewidth]{SWHID-v1.4_2.png}}} #+LATEX: \only<3->{\includegraphics[width=\linewidth]{SWHID-v1.4_3.png}} #+LATEX: %\forcebeamerend *** An emerging standard :B_block: :PROPERTIES: :BEAMER_act: <4-> :BEAMER_COL: .6 :BEAMER_env: block :END: - in Linux Foundation's [[https://spdx.github.io/spdx-spec/appendix-VI-external-repository-identifiers/#persistent-id][SPDX 2.2]] - IANA registered, WikiData property [[https://www.wikidata.org/wiki/Property:P6138][P6138]] *** Examples: :B_block: :PROPERTIES: :BEAMER_act: <5-> :BEAMER_COL: .4 :BEAMER_env: block :END: - [[https://archive.softwareheritage.org/swh:1:cnt:64582b78792cd6c2d67d35da5a11bb80886a6409;origin=https://github.com/virtualagc/virtualagc;lines=245-261/][Apollo 11 AGC excerpt]], - [[https://archive.softwareheritage.org/swh:1:cnt:bb0faf6919fc60636b2696f32ec9b3c2adb247fe;origin=https://github.com/id-Software/Quake-III-Arena;lines=549-572/][Quake III rsqrt]] ** A worked example #+LATEX: \centering\forcebeamerstart #+LATEX: \only<1>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_1.pdf}}} #+LATEX: \only<2>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/contents.pdf}}} #+LATEX: \only<3>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_2_contents.pdf}}} #+LATEX: \only<4>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/directories.pdf}}} #+LATEX: \only<5>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_3_directories.pdf}}} #+LATEX: \only<6>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/revisions.pdf}}} #+LATEX: \only<7>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_4_revisions.pdf}}} #+LATEX: \only<8>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/releases.pdf}}} #+LATEX: \only<9>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_5_releases.pdf}}} #+LATEX: \only<10>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/snapshots.pdf}}} #+LATEX: \forcebeamerend -* The way forward -** Come in, we're open! -#+BEGIN_EXPORT latex -% \begin{center} -% \includegraphics[width=.5\linewidth]{SWH-logo.pdf} -% \end{center} -% \begin{center} -% {\large \url{www.softwareheritage.org} \hspace{4em} \url{@swheritage}} -% \end{center} -#+END_EXPORT -*** Library of Alexandria of code :B_picblock:noexport: - :PROPERTIES: - :BEAMER_env: picblock - :BEAMER_COL: 0.4 - :BEAMER_OPT: pic=clock-spring-forward.png,width=.45\linewidth,leftpic=true - :END: - Working together to - - recover the past - - preserve our heritage - - share the knowledge - - prepare the future -*** Library of Alexandria of code :B_block:noexport: - :PROPERTIES: - :BEAMER_COL: 0.42 - :BEAMER_env: block - :END: - \begin{center}\includegraphics[width=.4\linewidth]{clock-spring-forward.png}\end{center} - - recover the past - - structure the future -*** :B_ignoreheading: - :PROPERTIES: - :BEAMER_env: ignoreheading - :END: - #+BEGIN_EXPORT latex - \vfill - \begin{center}\Large Questions?\end{center} - \vfill - #+END_EXPORT -*** Learn more \hfill www.softwareheritage.org/publications :B_block: - :PROPERTIES: - :BEAMER_env: block - :END: - #+BEGIN_EXPORT latex - \begin{thebibliography}{Foo Bar, 1969} - \footnotesize - - \bibitem{ARSCSWH} Roberto Di Cosmo. \emph{Archiving and referencing source code with Software Heritage}\newblock - ICMS 2020, preprint: hal-02526083 - - \bibitem{Abramatic2018} Jean-François Abramatic, Roberto Di Cosmo, Stefano Zacchiroli\newblock - \emph{Building the Universal Archive of Source Code}, Communications of the ACM, October 2018 - - \bibitem{DiCosmo2020} P. Alliez, R. Di Cosmo, B. Guedj, A. Girault, M. Hacid, A. Legrand, N. Rougier\newblock - \emph{Attributing and Referencing (Research) Software: Best Practices and Outlook From Inria}, - Computing in Science \& Engineering, 22 (1), pp. 39-52, 2020, ISSN: 1558-366X - - \bibitem{DiCosmo2020} Roberto Di Cosmo, Morane Gruenpeter, Stefano Zacchiroli\newblock - \emph{Referencing Source Code Artifacts: a Separate Concern in Software Citation}, - Computing in Science \& Engineering, 2020, ISSN: 1521-9615 - - \bibitem{OCamlP3l} Roberto Di Cosmo, Marco Danelutto\newblock - \emph{[Rp] Reproducing and replicating the OCamlP3l experiment}. ReScience C, 6(1), 2. - - \end{thebibliography} - #+END_EXPORT -*** :B_ignoreheading: - :PROPERTIES: - :BEAMER_env: ignoreheading - :END: -*** Come in, we're open :noexport: - \hfill https://sponsorship.softwarheritage.org +