diff --git a/common/images/article-parmap-code.png b/common/images/article-parmap-code.png new file mode 100644 index 0000000..793e183 Binary files /dev/null and b/common/images/article-parmap-code.png differ diff --git a/common/images/parmap-article-conclusion.png b/common/images/parmap-article-conclusion.png new file mode 100644 index 0000000..c0df7e6 Binary files /dev/null and b/common/images/parmap-article-conclusion.png differ diff --git a/common/images/parmap-cnt.png b/common/images/parmap-cnt.png new file mode 100644 index 0000000..9a08d94 Binary files /dev/null and b/common/images/parmap-cnt.png differ diff --git a/common/images/parmap-moved-to-github.png b/common/images/parmap-moved-to-github.png new file mode 100644 index 0000000..02e8e47 Binary files /dev/null and b/common/images/parmap-moved-to-github.png differ diff --git a/common/images/parmap-on-gitorious.png b/common/images/parmap-on-gitorious.png new file mode 100644 index 0000000..27ae446 Binary files /dev/null and b/common/images/parmap-on-gitorious.png differ diff --git a/common/images/swh-id-example-McBane2020-ReScience.png b/common/images/swh-id-example-McBane2020-ReScience.png new file mode 100644 index 0000000..59c996f Binary files /dev/null and b/common/images/swh-id-example-McBane2020-ReScience.png differ diff --git a/talks-public/2020-02-19-RDA-AMA/2020-02-19-RDA-AMA.org b/talks-public/2020-02-19-RDA-AMA/2020-02-19-RDA-AMA.org index 20e5494..0ce875b 100644 --- a/talks-public/2020-02-19-RDA-AMA/2020-02-19-RDA-AMA.org +++ b/talks-public/2020-02-19-RDA-AMA/2020-02-19-RDA-AMA.org @@ -1,371 +1,472 @@ #+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt) #+TITLE: Archiving And Referencing All Software Source Code Using Software Heritage #+SUBTITLE: #+AUTHOR: Roberto Di Cosmo #+EMAIL: roberto@dicosmo.org @rdicosmo @swheritage #+BEAMER_HEADER: \date{February 19th, 2020} #+BEAMER_HEADER: \title[Archiving And Referencing All Software Source Code]{Archiving And Referencing All Software Source Code} #+BEAMER_HEADER: \author[Roberto Di Cosmo \hspace{5em} www.dicosmo.org]{Roberto Di Cosmo\\[2em]} # #+BEAMER_HEADER: \setbeameroption{show notes on second screen} #+BEAMER_HEADER: \setbeameroption{hide notes} #+KEYWORDS: software heritage legacy preservation knowledge mankind technology #+LATEX_HEADER: \usepackage{tcolorbox} #+LATEX_HEADER: \definecolor{links}{HTML}{2A1B81} #+LATEX_HEADER: \hypersetup{colorlinks,linkcolor=,urlcolor=links} # # prelude.org contains all the information needed to export the main beamer latex source # use prelude-toc.org to get the table of contents # #+INCLUDE: "../../common/modules/prelude-toc.org" :minlevel 1 #+INCLUDE: "../../common/modules/169.org" # +LaTeX_CLASS_OPTIONS: [aspectratio=169,handout,xcolor=table] #+LATEX_HEADER: \usepackage{bbding} #+LATEX_HEADER: \DeclareUnicodeCharacter{66D}{\FiveStar} # # If you want to change the title logo it's here # # +BEAMER_HEADER: \titlegraphic{\includegraphics[width=0.7\textwidth]{SWH-logo}} # aspect ratio can be changed, but the slides need to be adapted # - compute a "resizing factor" for the images (macro for picblocks?) # # set the background image # # https://pacoup.com/2011/06/12/list-of-true-169-resolutions/ # #+BEAMER_HEADER: \pgfdeclareimage[height=90mm,width=160mm]{bgd}{swh-world-169.png} #+BEAMER_HEADER: \setbeamertemplate{background}{\pgfuseimage{bgd}} #+LATEX: \addtocounter{framenumber}{-1} * Introduction #+INCLUDE: "../../common/modules/rdc-bio.org::#main" :only-contents t :minlevel 2 ** The knowledge is in the /source code/ #+INCLUDE: "../../common/modules/source-code-different-short.org::#thesourcecode" :only-contents t :minlevel 3 ** Source code is /special/ *** /Executable/ and /human readable/ knowledge \hfill copyright law /“Programs must be written for people to read, and only incidentally for machines to execute.”/\\ \hfill Harold Abelson #+BEAMER: \pause *** Software /evolves/ over time - projects may last decades - the /development history/ is key to its /understanding/ #+BEAMER: \pause *** Complexity :B_picblock: :PROPERTIES: :BEAMER_env: picblock :BEAMER_OPT: pic=python3-matplotlib.pdf, width=.6\linewidth :END: - /millions/ of lines of code - large /web of dependencies/ + easy to break, difficult to maintain - sophisticated /developer communities/ * Academia's evolving practice ** Software is a pillar of Science ... *** Software is everywhere in modern science :PROPERTIES: :BEAMER_COL: .6 :BEAMER_env: block :END: #+BEGIN_QUOTE [...] the vast majority describe [...] or software that have become essential in their fields. \mbox{}\hfill Top 100 papers (\href{http://www.nature.com/news/the-top-100-papers-1.16224}{Nature, 2014}) #+END_QUOTE #+BEAMER: \pause *** :PROPERTIES: :BEAMER_COL: .45 :END: #+latex: \begin{center} #+ATTR_LATEX: :width \extblockscale{\linewidth} file:papermountain.jpg #+latex: \end{center} #+BEAMER: \pause *** :PROPERTIES: :BEAMER_env: ignoreheading :END: #+BEGIN_QUOTE Sometimes, if you dont have the software, you dont have the data \mbox{}\hfill Christine Borgman, Paris, 2018 #+END_QUOTE ** ... a /forgotten/ pillar of Open Science *** Lack of recognition :PROPERTIES: :BEAMER_env: block :END: not (yet) a first class citizen - in the EOSC plan # - in the EU copyright reform - in the scholarly world #+BEAMER: \pause *** Lack of consensus on how to :PROPERTIES: :BEAMER_env: block :END: - /archive/ software - /choose/ a license - /cite/ a software project ** Pressure to make the source code available is raising *** Why Necessary to - /reproduce/ and verify, - /modify/ and /evolve/, *building new experiments* from old ones #+BEAMER: \pause *** When and where - debate started end of first 2000 decade (biology, statistics, medicine, etc.) - growing in Computer Science since the [[https://www.artifact-eval.org/about.html][ESEC/FSE 2011 Artifact Evaluation context]] (winner: Vouillon and Di Cosmo) #+BEAMER: \pause *** A wealth of initiatives... - Policies: ACM [[https://www.acm.org/publications/policies/artifact-review-badging][Artifact Review and Badging]], AEC, ... - Working groups: [[https://www.force11.org/software-citation-principles][FORCE11]], [[https://www.rd-alliance.org/groups/software-source-code-ig][RDA]], [[https://www.ouvrirlascience.fr/logiciels-libres-et-open-source/][SPSO]], ... # - Metrics: [[https://www.ouvrirlascience.fr/about-the-proposal-for-software-indicators-in-open-science-monitor-3/][Open Science Monitor]] (Elsevier!), ... - Journals: [[https://www.ipol.im/][IPOL]], ReScience, InsightJournal, JOSS, eLife, ACM DL, ... - Repositories: FigShare, Zenodo, ... - Common infrastructures: [[https://www.softwareheritage.org][Software Heritage]] ** What is at stake \hfill in increasing order of difficulty \vspace{-5pt} *** Archival Research software artifacts must be properly *archived*\\ \hfill make it sure we can /retrieve/ them (/reproducibility/) #+BEAMER: \pause *** Identification Research software artifacts must be properly *referenced*\\ \hfill make it sure we can /identify/ them (/reproducibility/) #+BEAMER: \pause *** Metadata Research software artifacts must be properly *described*\\ \hfill make it easy to /discover/ them (/visibility/) #+BEAMER: \pause *** Citation Research software artifacts must be properly *cited* /(not the same as referenced!)/\\ \hfill to give /credit/ to authors (/evaluation/!) #+BEAMER: \pause *** :B_ignoreheading: :PROPERTIES: :BEAMER_env: ignoreheading :END: \vspace{-5pt} \hfill Let's focus on the /first two!/ \hfill\mbox{} -* Archiving and referencing /all/ the source code: Software Heritage +* Archiving and referencing /all/ the source code: Software Heritage :noexport: #+INCLUDE: "../../common/modules/swh-goals-oneslide-vertical.org::#goals" :minlevel 2 #+INCLUDE: "../../common/modules/swh-highlights.org::#hilights" :minlevel 2 + ** A principled infrastructure \hfill \url{http://bit.ly/swhpaper} :noexport: #+latex: \begin{center} #+ATTR_LATEX: :width 0.5\linewidth file:SWH-as-foundation-slim.png #+latex: \end{center} #+BEAMER: \pause #+latex: \centering #+ATTR_LATEX: :width \extblockscale{.7\linewidth} file:growth.png #+BEAMER: \pause *** Technology :PROPERTIES: :BEAMER_col: 0.34 :BEAMER_env: block :END: - transparency and FOSS - replicas all the way down *** Content (billions!) :PROPERTIES: :BEAMER_col: 0.32 :BEAMER_env: block :END: - *intrinsic identifiers* - facts and provenance *** Organization :PROPERTIES: :BEAMER_col: 0.33 :BEAMER_env: block :END: - non-profit - multi-stakeholder * Zoom on the SWH-ID ** Modern software development #+INCLUDE: "../../common/modules/vcs-history.org::#timeline" :only-contents t :minlevel 3 ** /Intrinsic/ identifiers for modern software development #+INCLUDE: "../../common/modules/vcs-history.org::#dvcs-to-merkle" :only-contents t :minlevel 3 ** The SWH-ID schema: syntax and semantics #+INCLUDE: "../../common/modules/swh-id-syntax.org::#swh-id-syntax" :only-contents t :minlevel 3 -** TODO: list of all types of identifiers + ** Walkthrough the Parmap article -TODO: also link to the example in ReScience +*** Danelutto and R. Di Cosmo, 2012 + :PROPERTIES: + :BEAMER_COL: .5 + :BEAMER_env: block + :END: + #+latex: \begin{center} + #+ATTR_LATEX: :width \extblockscale{1.45\linewidth} + file:parmap-article-conclusion.png + #+latex: \end{center} + + #+latex: \begin{tiny} + M. Danelutto and R. Di Cosmo, /A “Minimal Disruption” skeleton +experiment: Seamless map & reduce embedding in OCaml,” Procedia +CS, vol. 9, pp. 1837–1846, 2012. [Online]. Available: +\href{http://dx.doi.org/10.1016/j.procs.2012.04.202}{[DOI: 10.1016/j.procs.2012.04.202]} + #+latex: \end{tiny} + + #+BEAMER: \pause +*** + :PROPERTIES: + :BEAMER_COL: .5 + :END: + + #+latex: \begin{center} + \begin{tiny} + Accessed on the 6th of February 2020 + \end{tiny} + \vspace{-15pt} + #+ATTR_LATEX: :width \extblockscale{1.5\linewidth} + file:parmap-on-gitorious.png + #+BEAMER: \pause + \vspace{-10pt} + \begin{tiny} + \href{https://archive.softwareheritage.org/swh:1:snp:78209702559384ee1b5586df13eca84a5123aa82;origin=https://gitorious.org/parmap/parmap.git/}{swh:1:snp:78209702559384ee1b5586df13eca84a5123aa82} + \end{tiny} + \vspace{-15pt} + #+ATTR_LATEX: :width \extblockscale{1.5\linewidth} + file:parmap-moved-to-github.png + + #+latex: \end{center} + + #+BEAMER: \pause + +*** :B_ignoreheading: + :PROPERTIES: + :BEAMER_env: ignoreheading + :END: + +\vspace{+5pt} +\hfill Only 8 years later ! \hfill\mbox{} + + + +** Referencing an algorithm in the source code +\vspace{-30pt} +*** Simple implementation of the distribution, fork, and recollection phases in Parmap + :PROPERTIES: + :BEAMER_COL: .5 + :BEAMER_env: block + :END: + #+latex: \begin{center} + #+ATTR_LATEX: :width \extblockscale{\linewidth} + file:article-parmap-code.png + + #+latex: \end{center} +#+BEAMER: \pause +*** + :PROPERTIES: + :BEAMER_COL: .5 + :END: + #+latex: \begin{center} + \vspace{+10pt} + \begin{tcolorbox} + \begin{scriptsize} + \href{https://archive.softwareheritage.org/swh:1:cnt:d5214ff9562a1fe78db51944506ba48c20de3379;origin=https://gitorious.org/parmap/parmap.git;lines=101-143/} + {swh:1:cnt:d5214ff9562a1fe78db51944506ba48c20de3379;\\ + origin=https://gitorious.org/parmap/parmap.git;\\ + lines=101-143} + \end{scriptsize} + \end{tcolorbox}\noindent + \vspace{-40pt} + #+ATTR_LATEX: :width \extblockscale{1.5\linewidth} + file:parmap-cnt.png + + #+latex: \end{center} + +** Today with the SWH-ID the risk is reduced + #+latex: \begin{center} + + \begin{tcolorbox} + \begin{scriptsize} + TODO add swh-id + \end{scriptsize} + \end{tcolorbox}\noindent + #+ATTR_LATEX: :width \extblockscale{\linewidth} + file:swh-id-example-McBane2020-ReScience.png + + #+latex: \end{center} +*** TODO conclude advantages (maybe with 3 small boxes) + ** A few examples :noexport: *** Let's look at some famous exceprts of source code #+BEAMER: \pause *** Apollo 11 source code ([[https://archive.softwareheritage.org/swh:1:cnt:64582b78792cd6c2d67d35da5a11bb80886a6409;origin=https://github.com/virtualagc/virtualagc;lines=245-261/][excerpt]]) :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.48 :BEAMER_env: block :END: #+LATEX: \includegraphics[width=\linewidth]{apollo-11-cranksilly.png} # excerpt of routine that asks astronaut to turn around the LEM #+BEAMER: \pause *** Quake III source code ([[https://archive.softwareheritage.org/swh:1:cnt:bb0faf6919fc60636b2696f32ec9b3c2adb247fe;origin=https://github.com/id-Software/Quake-III-Arena;lines=549-572/][excerpt]]) :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.45 :BEAMER_env: block :END: #+LATEX: \includegraphics[width=\linewidth]{quake-carmack-sqrt-1.png} # smart efficient implementation of 1/sqrt(x) on a CPU without special support #+BEAMER: \pause *** :B_ignoreheading: :PROPERTIES: :BEAMER_env: ignoreheading :END: *** It works! we have /intrinsic/ identifiers for all 20+ billion objects in the archive * Practical guidelines for archiving and referencing ** Prepare your software source code \hfill \href{https://www.softwareheritage.org/save-and-reference-research-software/}{SWH guidelines} # scientific software (save code now) use-case (one slide)- preapre #+INCLUDE: "../../common/modules/swh-scientific-preservation.org::#prepare" :only-contents t :minlevel 3 ** Submit save request on SWH \hfill \href{https://www.softwareheritage.org/save-and-reference-research-software/}{SWH guidelines} # scientific software (save code now) use-case (one slide) #+INCLUDE: "../../common/modules/swh-scientific-preservation.org::#save" :only-contents t :minlevel 3 ** Reference software artifacts in your articles \hfill \href{https://www.softwareheritage.org/save-and-reference-research-software/}{SWH guidelines} # scientific software (save code now) use-case (one slide) #+INCLUDE: "../../common/modules/swh-scientific-preservation.org::#reference" :only-contents t :minlevel 3 * Challenges ** Much more complex than it seems *** Software is complex - Structure :: monolithic/composite; self-contained/external dependencies - Lifetime :: one-shot/long term - Community :: one man/one team/distributed community - Authorship :: complex set of roles - Authority :: institutions/organizations/communities/single person #+BEAMER: \pause *** Various granularities - Exact status of the source code :: for reproducibility, e.g. #+latex: \emph{``you can find at \href{https://archive.softwareheritage.org/swh:1:cnt:cdf19c4487c43c76f3612557d4dc61f9131790a4;lines=146-187/}{swh:1:cnt:cdf19c4487c43c76f3612557d4dc61f9131790a4;lines=146-187} the core algorithm used in this article''} - (Major) release :: \emph{``This functionality is available in OCaml version 4''} - Project :: \emph{``Inria has created OCaml and Scikit-Learn''}. ** We are not alone *** Research Software does not exist in isolation :B_picblock: :PROPERTIES: :BEAMER_env: picblock :BEAMER_OPT: pic=python3-matplotlib.pdf, width=.6\linewidth, leftpic=true :END: large /web of dependencies/ on non-research software #+BEAMER: \pause *** Industry and developers have been here :B_block: :PROPERTIES: :BEAMER_env: block :BEAMER_COL: .5 :END: - NSRL (NIST) - SPDX (Linux Foundation) - SWH-ID (Software Heritage) - SWID (ISO Standard) - Wikidata Software Properties #+BEAMER: \pause *** We must :B_block: :PROPERTIES: :BEAMER_env: block :BEAMER_COL: .5 :END: - accept the complexity - avoid reinventing the wheel - connect with existing communities of practice ** Proposals for the scholarly world take from https://annex.softwareheritage.org/public/talks/2019/2019-09-24-ENS-Cachan.pdf * Connecting communities ** FORCE11 Software Citation Implementation WG *** Spawned from the FORCE11 Software Citation WG (2/2016) led by Daniel Katz, Kyle Niemeyer and Arfon Smith *** Co-chairs Neil Chue Hong, Martin Fenner, Daniel Katz #+TODO:fill in with links ** RDA Software Source Code Interest Group *** Co-chairs Roberto Di Cosmo, Neil Chue Hong, Mingfang Wu, Julia Collins *** Objectives a forum for discussing /software/ inside RDA *** Chronology - RDA 10, Montreal 9/2017 :: motivations, survey of ontologies, metadata use cases - RDA 11, Berlin 3/2018 :: identification of gaps in metadata - RDA 13, Philadelphia 4/2019 :: FAIR for Software Source Code - RDA 15, Melbourne 3/2020 :: Should we create a FAIR4Software WG? *** Web page https://www.rd-alliance.org/groups/software-source-code-ig ** RDA WG on Software Source Code Identification *** Joint RDA & FORCE11 WG which spawned from RDA's Software Source Code IG & FORCE11's SCIWG *** Co-chairs Roberto Di Cosmo, Daniel Katz, Martin Fenner *** Objectives - bring together people involved/interested in /software identification/ - produce concrete recommendations for the academic community *** Chronology - FORCE2019, Edinburgh 10/2019 :: Research Software Hackathon - identification track - RDA 15, Melbourne 3/2020 :: Software identification use cases *** https://www.rd-alliance.org/groups/software-source-code-identification-wg ** Inria's Software Citation Working Group *** Members \hfill task force of Inria's scientific council *** Mission - map the landscape - collect best practices - identify potential Inria contributions - make recommendations *** First outcome Position paper available from \hfill https://hal.archives-ouvertes.fr/hal-02135891 * The road ahead ** Conclusions Conclusions from: https://annex.softwareheritage.org/public/talks/2019/2019-09-24-ENS-Cachan.pdf ** Come in, we're open ! #+INCLUDE: "../../common/modules/last-slide-references.org::#references-identifiers" :only-contents t :minlevel 3 * Appendix :B_appendix: :PROPERTIES: :BEAMER_env: appendix :END: * Worked example Merkle tree ** A worked example #+LATEX: \centering\forcebeamerstart #+LATEX: \only<1>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_1.pdf}}} #+LATEX: \only<2>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/contents.pdf}}} #+LATEX: \only<3>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_2_contents.pdf}}} #+LATEX: \only<4>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/directories.pdf}}} #+LATEX: \only<5>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_3_directories.pdf}}} #+LATEX: \only<6>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/revisions.pdf}}} #+LATEX: \only<7>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_4_revisions.pdf}}} #+LATEX: \only<8>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/releases.pdf}}} #+LATEX: \only<9>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_5_releases.pdf}}} #+LATEX: \only<10>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/snapshots.pdf}}} #+LATEX: \forcebeamerend * History of VCS +** Evolution of software development + #+INCLUDE: "../../common/modules/vcs-history.org::#timeline" :only-contents t :minlevel 3 ** Foundations of modern DVCS #+INCLUDE: "../../common/modules/vcs-history.org::#dvcs-to-merkle" :only-contents t :minlevel 3 ** In a picture \hfill (from https://github.com/progit/progit2) #+INCLUDE: "../../common/modules/vcs-history.org::#vcs-explained" :only-contents t :minlevel 3 ** A massive adoption #+INCLUDE: "../../common/modules/vcs-history.org::#adoption" :only-contents t :minlevel 3 -** Evolution of software development - #+INCLUDE: "../../common/modules/vcs-history.org::#timeline" :only-contents t :minlevel 3 +