diff --git a/talks-public/2020-03-12-CodeSources/2020-03-12-CodeSources.org b/talks-public/2020-03-12-CodeSources/2020-03-12-CodeSources.org index e4da590..bd4782b 100644 --- a/talks-public/2020-03-12-CodeSources/2020-03-12-CodeSources.org +++ b/talks-public/2020-03-12-CodeSources/2020-03-12-CodeSources.org @@ -1,682 +1,682 @@ #+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt) #+TITLE: Archiving, referencing and attributing research software #+SUBTITLE: towards software as a first class citizen # #+AUTHOR: Roberto Di Cosmo # #+EMAIL: roberto@dicosmo.org @rdicosmo @swheritage #+BEAMER_HEADER: \date{March 12, 2020} #+BEAMER_HEADER: \title[(CC-BY 4.0) Research Software]{Archiving, referencing and attributing research software} #+BEAMER_HEADER: \author[Roberto Di Cosmo]{Roberto Di Cosmo\\Seminaire Codes Sources, Paris} # #+BEAMER_HEADER: \setbeameroption{show notes on second screen} #+BEAMER_HEADER: \setbeameroption{hide notes} #+KEYWORDS: software heritage legacy preservation knowledge mankind technology #+LATEX_HEADER: \usepackage{tcolorbox} #+LATEX_HEADER: \definecolor{links}{HTML}{2A1B81} #+LATEX_HEADER: \hypersetup{colorlinks,linkcolor=,urlcolor=links} # # prelude.org contains all the information needed to export the main beamer latex source # use prelude-toc.org to get the table of contents # #+INCLUDE: "../../common/modules/prelude-toc.org" :minlevel 1 #+INCLUDE: "../../common/modules/169.org" # +LaTeX_CLASS_OPTIONS: [aspectratio=169,handout,xcolor=table] #+LATEX_HEADER: \usepackage{bbding} #+LATEX_HEADER: \DeclareUnicodeCharacter{66D}{\FiveStar} # # If you want to change the title logo it's here # # +BEAMER_HEADER: \titlegraphic{\includegraphics[width=0.7\textwidth]{SWH-logo}} # aspect ratio can be changed, but the slides need to be adapted # - compute a "resizing factor" for the images (macro for picblocks?) # # set the background image # # https://pacoup.com/2011/06/12/list-of-true-169-resolutions/ # #+BEAMER_HEADER: \pgfdeclareimage[height=90mm,width=160mm]{bgd}{swh-world-169.png} #+BEAMER_HEADER: \setbeamertemplate{background}{\pgfuseimage{bgd}} #+LATEX: \addtocounter{framenumber}{-1} * Software Source Code: a (forgotten) pillar of Science ** Software source code: a precious part of our heritage #+INCLUDE: "../../common/modules/source-code-different-short.org::#softwareisdifferent" :only-contents t :minlevel 3 ** Source code is a /special/ and endangered heritage *** /Executable/ and /human readable/ knowledge \hfill copyright law :noexport: /“Programs must be written for people to read, and only incidentally for machines to execute.”/\\ \hfill Harold Abelson #+BEAMER: \pause *** Software /evolves/ over time - projects may last decades - the /development history/ is key to its /understanding/ #+BEAMER: \pause *** Complexity :B_picblock: :PROPERTIES: :BEAMER_env: picblock :BEAMER_OPT: pic=python3-matplotlib.pdf, width=.6\linewidth :END: - /millions/ of lines of code - large /web of dependencies/ + easy to break, difficult to maintain - sophisticated /developer communities/ *** :B_ignoreheading: :PROPERTIES: :BEAMER_env: ignoreheading :END: #+BEAMER: \pause *** Precious, endangered /Executable/ and /human readable/ knowledge key people *are passing away* ...\\ \hfill no organised effort to catalog and archive it (more later) ** Software Source code: pillar of Open Science *** Software is everywhere in modern research :B_picblock: :PROPERTIES: :BEAMER_opt: pic=papermountain, leftpic=true, width=.3\linewidth :BEAMER_env: picblock :BEAMER_COL: .6 :END: #+BEGIN_QUOTE [...] software [...] essential in their fields. \mbox{}\hfill Top 100 papers (Nature, 2014) #+END_QUOTE #+BEGIN_QUOTE Sometimes, if you dont have the software, you dont have the data \mbox{}\hfill Christine Borgman, Paris, 2018 #+END_QUOTE # http://www.nature.com/news/the-top-100-papers-1.16224 #+BEAMER: \pause *** Open Science: three pillars :B_block: :PROPERTIES: :BEAMER_COL: .45 :BEAMER_env: block :END: #+latex: \begin{center} #+ATTR_LATEX: :width \extblockscale{\linewidth} file:PreservationTriangle.png #+latex: \end{center} #+BEAMER: \pause *** :B_ignoreheading: :PROPERTIES: :BEAMER_env: ignoreheading :END: *** Nota bene \hfill The links in the picture are *essential* ** The state of the art (in CS!) is far from ideal *** ICSE (Zannier, Melrik, Maurer, 2006) - complete absence of replication studies *** ACM TOSEM 2001 to 2006 \hfill C. Ghezzi http://bit.ly/tosemreprod - 60% of all papers have tools: *only 20%* /installable/ *** Collberg's 2015 study \hfill http://reproducibility.cs.arizona.edu/ - 601 mainstream papers: 508 with tools, *only 40%* /installable/ #+BEAMER: \pause *** Main reasons \hfill source code (/or the right version of it/) cannot be found ** Where we stand *** A wealth of initiatives! - Policies: ACM [[https://www.acm.org/publications/policies/artifact-review-badging][Artifact Review and Badging]], ... - Working groups: [[https://www.force11.org/software-citation-principles][FORCE11]], [[https://www.rd-alliance.org/groups/software-source-code-ig][RDA]], [[https://www.ouvrirlascience.fr/logiciels-libres-et-open-source/][SPSO]], ... - Metrics: [[https://www.ouvrirlascience.fr/about-the-proposal-for-software-indicators-in-open-science-monitor-3/][Open Science Monitor]] (Elsevier!), ... - Journals: [[https://www.ipol.im/][IPOL]], ReScience, InsightJournal, eLife, ACM DL, ... - Repositories: FigShare, Zenodo, ... #+BEAMER: \pause *** \hfill but ... \hfill \mbox{} *** Lack of recognition :PROPERTIES: :BEAMER_env: block :BEAMER_COL: .5 :END: not (yet) a first class citizen - in the EOSC plan # - in the EU copyright reform - in the scholarly works #+BEAMER: \pause *** Lack of proper guidance on how to :PROPERTIES: :BEAMER_env: block :BEAMER_COL: .5 :END: - /archive/ and /reference/ software - choose a license - /cite/ a software project # #+BEAMER: \pause # *** :B_ignoreheading: # :PROPERTIES: # :BEAMER_env: ignoreheading # :END: # *** Lack of basic prerequisites to reproducibility # See a discussion in \url{annex.softwareheritage.org/talks/2018/2018-09-17-STScI_public.pdf} +** A plurality of needs +*** Researcher + - archive and reference sw used in articles + - get credit for the software they develop + - verify/reproduce/improve results +*** Laboratory/team + - track software contributions + - produce up-to date report / web page +*** University/Research Organization + - central view of research software assets + - tech transfer + - impact metrics ** What is at stake \hfill in increasing order of difficulty *** Archival Research software artifacts must be properly *archived*\\ \hfill make it sure we can /retrieve/ them (/reproducibility/) #+BEAMER: \pause *** Identification Research software artifacts must be properly *referenced*\\ \hfill make it sure we can /identify/ them (/reproducibility/) #+BEAMER: \pause *** Metadata Research software artifacts must be properly *described*\\ \hfill make it easy to /discover/ them (/visibility/) #+BEAMER: \pause *** Citation Research software artifacts must be properly *cited* /(not the same as referenced!)/\\ \hfill to give /credit/ to authors (/evaluation/!) -** A plurality of needs -*** Researcher - - archive and reference sw used in articles - - get credit for the software they develop - - verify/reproduce/improve results -*** Laboratory/team - - track software contributions - - produce up-to date report / web page -*** University/Research Organization - - central view of research software assets - - tech transfer - - impact metrics * Software Heritage ** Software Heritage in a nutshell \hfill www.softwareheritage.org #+BEAMER: \transdissolve #+INCLUDE: "../../common/modules/swh-goals-oneslide-vertical.org::#goals" :only-contents t :minlevel 3 ** An international, non profit initiative\hfill built for the long term :PROPERTIES: :CUSTOM_ID: support :END: *** Sharing the vision :B_block: :PROPERTIES: :CUSTOM_ID: endorsement :BEAMER_COL: .5 :BEAMER_env: block :END: #+LATEX: \begin{center}{\includegraphics[width=\extblockscale{.4\linewidth}]{unesco_logo_en_285}}\end{center} #+LATEX: \vspace{-0.8cm} #+LATEX: \begin{center}\vskip 1em \includegraphics[width=\extblockscale{1.4\linewidth}]{support.pdf}\end{center} #+latex: \small And many more ...\\ #+latex:\mbox{}~~~~~~~\tiny\url{www.softwareheritage.org/support/testimonials} *** Donors, members, sponsors :B_block: :PROPERTIES: :CUSTOM_ID: sponsors :BEAMER_COL: .5 :BEAMER_env: block :END: #+LATEX: \begin{center}\includegraphics[width=\extblockscale{.4\linewidth}]{inria-logo-new}\end{center} #+LATEX: \begin{center} # #+LATEX: \includegraphics[width=\extblockscale{.2\linewidth}]{sponsors-levels.pdf} #+LATEX: \colorbox{white}{\includegraphics[width=\extblockscale{1.4\linewidth}]{sponsors.pdf}} #+LATEX: \end{center} # - sponsoring / partnership :: \hfill \url{sponsorship.softwareheritage.org} *** :B_ignoreheading: :PROPERTIES: :BEAMER_env: ignoreheading :END: *** Research collaboration :B_picblock:noexport: :PROPERTIES: :BEAMER_COL: .5 :BEAMER_env: picblock :BEAMER_OPT: pic=Qwant_Logo, leftpic=true :END: source code search engine *** See more :noexport: \hfill\tiny\url{http:://www.softwareheritage.org/support/testimonials} *** Global network :B_picblock:noexport: :PROPERTIES: :BEAMER_COL: .5 :BEAMER_env: picblock :BEAMER_OPT: pic=fossid, leftpic=true, width=.3\linewidth :END: - first *independent mirror* - increased reliability ** The largest software archive, a shared infrastructure #+latex: \begin{center} #+ATTR_LATEX: :width 0.7\linewidth file:SWH-as-foundation-slim.png #+latex: \end{center} #+BEAMER: \pause #+latex: \centering #+ATTR_LATEX: :width \extblockscale{.9\linewidth} file:2019-09-archive-growth.png ** A peek under the hood #+BEAMER: \begin{center} #+BEAMER: \mode{\only<1>{\includegraphics[width=\extblockscale{1\textwidth}]{swh-dataflow-merkle-listers.pdf}}} #+BEAMER: \only<2-3>{\includegraphics[width=\extblockscale{1\textwidth}]{swh-dataflow-merkle.pdf}} #+BEAMER: \end{center} #+BEAMER: \pause #+BEAMER: \pause /Global development history/ permanently archived in a /unique/ git-like Merkle DAG - *~400 TB* (uncompressed) blobs, *~20 B* nodes, *~280 B* edges # - *GitHub*, Gitlab.com, Bitbucket, /Gitorious/, /GoogleCode/, GNU, PyPi, Debian, NPM... ** Software Heritage for Research and Innovation *** Reference platform for /Big Code/ :B_picblock: :PROPERTIES: :BEAMER_opt: pic=universal, leftpic=true, width=.2\linewidth :BEAMER_env: picblock :BEAMER_act: :END: - unique *observatory* of all software development - *big data, machine learning* paradise: classification, trends, coding patterns, code completion... #+BEAMER: \pause *** First datasets are available! - full graph of software development (~20Bn nodes, ~200Bn edges) see Pietri, Spinellis, Zacchiroli, MSR 2019 https://dx.doi.org/10.1109/MSR.2019.00030 - MSR 2020 mining competition see https://2020.msrconf.org/track/msr-2020-mining-challenge#Call-for-Papers ** Raising awareness about Software Source Code *** :B_column:BMCOL: :PROPERTIES: :BEAMER_col: .53 :BEAMER_env: column :END: #+ATTR_LATEX: :width .7\linewidth file:UNESCOParisCallMeeting.png UNESCO, Inria, Software Heritage invite\\ [[https://en.unesco.org/news/experts-call-greater-recognition-software-source-code-heritage-sustainable-development][40 international experts meet in Paris]] ... #+BEAMER: \pause *** :B_column:BMCOL: :PROPERTIES: :BEAMER_col: .5 :BEAMER_env: column :END: #+ATTR_LATEX: :width .65\linewidth file:paris_call_ssc_cover.jpg [[https://en.unesco.org/foss/paris-call-software-source-code][Their call is published on Feb 2019]] \pause *** :B_ignoreheading: :PROPERTIES: :BEAMER_env: ignoreheading :END: *** :PROPERTIES: :BEAMER_COL: 1.06 :BEAMER_env: block :END: It's an important /policy tool/, already referenced and used ... \hfill /yes, you can sign it!/\\ \vspace{10pt} \hfill https://en.unesco.org/foss/paris-call-software-source-code \hfill\mbox{} * Archive and reference /all/ the source code ** Archive and reference *** Software Heritage: a revolutionary infrastructure :B_picblock: :PROPERTIES: :BEAMER_env: picblock :BEAMER_OPT: pic=PreservationTriangle.png,leftpic=true, width=.34\linewidth :END: - *universal archive* of all source code + we archive /all/ software: both research and non research + we /proactively collect software/ in a systematic way - *intrinsic* identifiers for *reproducibility* + identify software artefacts /without any third party/ + cryptographically strong, compatible with git hashes #+BEAMER: \pause *** Full guidelines available! \hfill \tiny https://www.softwareheritage.org/save-and-reference-research-software/ *** Save code now ... [[https://archive.softwareheritage.org/save/][in just a few clicks]] *** Demo My 2012 Parmap paper [[http://www.dicosmo.org/Publications/Parmap2012.html][before]] and [[http://www.dicosmo.org/share/parmap_swh.pdf][after]]; other links: [[https://www.softwareheritage.org/2019/07/20/archiving-and-referencing-the-apollo-source-code/][Apollo 11]] (and [[https://www.softwareheritage.org/2019/07/20/archiving-and-referencing-the-apollo-source-code/][blog]]), [[https://archive.softwareheritage.org/swh:1:cnt:bb0faf6919fc60636b2696f32ec9b3c2adb247fe;origin=https://github.com/id-Software/Quake-III-Arena;lines=548-572/][Quake III Arena]] ** The SWH-ID schema # TODO: drawing with swh:1:cnt:xxxxxxx "exploded" and explained #+LATEX: \centering\forcebeamerstart #+LATEX: \only<1>{\includegraphics[width=\linewidth]{SWH-ID-1.png}} #+LATEX: \only<2>{\includegraphics[width=\linewidth]{SWH-ID-2.png}} #+LATEX: \only<3>{\includegraphics[width=\linewidth]{SWH-ID-3.png}} #+LATEX: \forcebeamerend ** A worked example #+LATEX: \centering\forcebeamerstart #+LATEX: \only<1>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_1.pdf}}} #+LATEX: \only<2>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/contents.pdf}}} #+LATEX: \only<3>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_2_contents.pdf}}} #+LATEX: \only<4>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/directories.pdf}}} #+LATEX: \only<5>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_3_directories.pdf}}} #+LATEX: \only<6>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/revisions.pdf}}} #+LATEX: \only<7>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_4_revisions.pdf}}} #+LATEX: \only<8>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/releases.pdf}}} #+LATEX: \only<9>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_5_releases.pdf}}} #+LATEX: \only<10>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/snapshots.pdf}}} #+LATEX: \forcebeamerend ** Zoom on the trust model for identifiers \vspace{-5pt} *** Trust model for usual DOIs :B_block: :PROPERTIES: :BEAMER_COL: .5 :BEAMER_env: block :END: #+ATTR_LATEX: :width \linewidth file:doi-vs-pid-1.pdf #+BEAMER: \pause *** Trust model for DOIs with checksums :B_block: :PROPERTIES: :BEAMER_COL: .5 :BEAMER_env: block :END: #+ATTR_LATEX: :width \linewidth file:doi-vs-pid-2.pdf *** :B_ignoreheading: :PROPERTIES: :BEAMER_env: ignoreheading :END: #+BEAMER: \pause *** Trust model for SWH-IDs :PROPERTIES: :END: #+ATTR_LATEX: :width .3\linewidth file:doi-vs-pid-3.pdf * Describe and cite /research/ source code ** Context *** Many articles/guidelines :B_block: :PROPERTIES: :BEAMER_env: block :BEAMER_COL: .4 :END: - reproducibility - archival - credit and evaluation #+BEAMER: \pause *** Most common limitations :B_block: :PROPERTIES: :BEAMER_env: block :BEAMER_COL: .6 :END: - software is 'just data' - citation = reference = DOIs - citation produced by automated tools #+BEAMER: \pause *** :B_ignoreheading: :PROPERTIES: :BEAMER_env: ignoreheading :END: *** A few remarkable exceptions - [[https://www.ascl.net][ASCL]] (since 1999): metadata only, carefully curated - [[https://www.geodynamics.org][geodynamics.org]] : source, documentation, metadata - [[https://swmath.org][swmath.org]] : software catalog via articles #+BEAMER: \pause *** Software Citation WG at Inria (since 10/2018) - leverage a 50 year experience, make recommendations - read more https://hal.archives-ouvertes.fr/hal-02135891 ** Why it is not simple *** Software is complex - Structure :: monolithic/composite; self-contained/external dependencies - Lifetime :: one-shot/long term - Community :: one man/one team/distributed community - Authorship :: complex set of roles /(more later)/ - Authority :: institutions/organizations/communities/single person #+BEAMER: \pause *** Various granularities - Exact status of the source code :: for reproducibility, e.g. #+latex: \emph{``you can find at \href{https://archive.softwareheritage.org/swh:1:cnt:cdf19c4487c43c76f3612557d4dc61f9131790a4;lines=146-187/}{swh:1:cnt:cdf19c4487c43c76f3612557d4dc61f9131790a4;lines=146-187} the core algorithm used in this article''} - (Major) release :: \emph{``This functionality is available in OCaml version 4''} - Project :: \emph{``Inria has created OCaml and Scikit-Learn''}. ** Proposals for the scholarly world *** Refined ontology for contributors :B_block: :PROPERTIES: :BEAMER_COL: .55 :BEAMER_env: block :END: - Design, Architecture, - Coding, Testing, Debugging, - Documentation, Maintenance, Support, - Management \hfill see also [[https://www.casrai.org/credit.html][CRediT]], [[https://geodynamics.org/cig/metadata/?software=aspect&version=2.1.0][Geodynamics]] #+BEAMER: \pause *** Reference is distinct from citation :B_block: :PROPERTIES: :BEAMER_COL: .5 :BEAMER_env: block :END: - *Reference* is for /reproducibility/ - *Citation* is for /credit/ \hfill They must not be conflated - Beware :: of the numbers game: \hfill ... do we really want an /s-index/ ? *** :B_ignoreheading: :PROPERTIES: :BEAMER_env: ignoreheading :END: #+BEAMER: \pause *** Keep the human in the loop :B_block: :PROPERTIES: :BEAMER_env: block :END: When /credit/ is at stake, automation/crowdsourcing is not enough!\\ \hfill Humans /are needed/ to get /quality information/ ** First steps with HAL / Software Heritage *** How it works, what is special :B_picblock: :PROPERTIES: :BEAMER_env: picblock :BEAMER_OPT: pic=deposit-communication.png,width=.4\linewidth,leftpic=true :END: \noindent *\hspace{1em}Generic mechanism:* - SWORD based - *review process* - versioning # - /industry chimes in/ (details on demand) #+BEAMER: \pause *Today*: deposit .zip or .tar.gz file ([[http://bit.ly/swhdeposithalen][/guide/]])\\ *Tomorrow*: just provide the /SWH id/ #+BEAMER: \pause *** Deposit/describe research software in HAL - author: https://hal.archives-ouvertes.fr/hal-01872189 - moderator: https://hal.archives-ouvertes.fr/hal-01876705 *** Examples [[https://hal.archives-ouvertes.fr/hal-02130801][LinBox]], [[https://hal.archives-ouvertes.fr/hal-01897934][SLALOM]], [[https://hal.archives-ouvertes.fr/hal-02130729][Givaro]], [[https://hal.archives-ouvertes.fr/hal-02137040][NS2DDV]], [[https://hal.archives-ouvertes.fr/lirmm-02136558][SumGra]], [[https://hal.archives-ouvertes.fr/hal-02155786][Coq proof]], ... ** The swmath.org approach *** Article based citation See for example: - [[https://swmath.org/software/7116][SemiPar on swmath.org]] * The road ahead ** Conclusion *** Research software :B_block: :PROPERTIES: :BEAMER_env: block :BEAMER_COL: .5 :END: - pillar of open science # - not just data - finally in the limelight *** Doing it right is not easy :B_block: :PROPERTIES: :BEAMER_env: block :BEAMER_COL: .5 :END: - /simplistic/ approaches, "just data", ... # - /directives/ are coming - soon part of /research evaluation/ *** :B_ignoreheading: :PROPERTIES: :BEAMER_env: ignoreheading :END: #+BEAMER: \pause *** You can help make a change - leverage Software Heritage in conferences and journals for /archival/ and /reference/ - join the conversation on /software citation/ and /software evaluation/ criteria - tackle the scientific problems : big code, classification, infrastructure, etc. #+BEAMER: \pause *** :B_ignoreheading: :PROPERTIES: :BEAMER_env: ignoreheading :END: #+latex: \hfill {\Large\bf Thank you!} \hfill\mbox{} #+BEGIN_EXPORT latex \begin{thebibliography}{Foo Bar, 1969} \footnotesize \bibitem{Abramatic2018} Jean-François Abramatic, Roberto Di Cosmo, Stefano Zacchiroli\newblock Building the Universal Archive of Source Code\newblock Communications of the ACM, October 2018 \bibitem{DiCosmo2018} Roberto Di Cosmo, Morane Gruenpeter, Stefano Zacchiroli\newblock Identifiers for Digital Objects: the Case of Software Source Code Preservation\newblock iPRES 2018: Intl. Conf. on Digital Preservation \end{thebibliography} #+END_EXPORT * Appendix :B_appendix: :PROPERTIES: :BEAMER_env: appendix :END: ** \vfill \centerline{\Huge Appendix} \vfill * News ** Milestones :noexport: #+INCLUDE: "../../common/modules/swh-key-dates.org::#keydates" :minlevel 3 :only-contents t ** News : archiving /public/ code #+latex: \begin{center} #+ATTR_LATEX: :width 0.7\linewidth file:codeetalab.png #+latex: \end{center} #+BEAMER: \pause https://code.etalab.gouv.fr ** News : SWHAP *** Paris Call on Software Source Code “[We call to] support efforts to gather and preserve the artifacts and narratives of the history of computing, while the earlier creators are still alive” #+BEAMER: \pause *** SWHAP : an important step forward - detailed guidelines to *curate* landmark legacy source code and *archive* it on Software Heritage - intense cooperation with *Università di Pisa* and *UNESCO* - open to all, we'll promote it worldwide *** https://www.softwareheritage.org/swhap ** News : ENEA mirror *** Thomas Jefferson, February 18, 1791 :B_block: :PROPERTIES: :BEAMER_ACT: :BEAMER_env: block :END: #+latex: {\em ...let us save what remains: not by vaults and locks which fence them from the public eye and use in consigning them to the waste of time, but by such a multiplication of copies, as shall place them beyond the reach of accident. #+latex: } #+BEAMER: \pause *** Welcoming ENEA :B_block: :PROPERTIES: :BEAMER_env: picblock :BEAMER_OPT: pic=LogoENEAcompletoENG.png, leftpic=true, width=.7\linewidth :END: - first *institutional* mirror - increased resilience - *AI infrastructure* for researchers - stepping stone to \endgraf \hfill an European joint effort * Inria's committment ** Inria's ongoing contributions *** Software Heritage - universal archive :: (research) software source code [[https://archive.softwareheritage.org/][archived and referenced]] *** Reproducibility - tools :: [[https://www.gnu.org/software/guix/][Guix]] (now [[https://www.softwareheritage.org/2019/04/18/software-heritage-and-gnu-guix-join-forces-to-enable-long-term-reproducibility/][with Software Heritage]]) - training/research :: RR workshops, MOOC *** Research software curation - HAL - SWH bridge :: curation of metadata, and [[https://hal.inria.fr/hal-01872189][deposit in Software Heritage]] * Big Code ** Software Heritage for Research and Innovation *** Reference platform for /Big Code/ :B_picblock: :PROPERTIES: :BEAMER_opt: pic=universal, leftpic=true, width=.2\linewidth :BEAMER_env: picblock :BEAMER_act: :END: - unique *observatory* of all software development - *big data, machine learning* paradise: classification, trends, coding patterns, code completion... #+BEAMER: \pause *** First datasets are available! - full graph of software development (~20Bn nodes, ~200Bn edges) see Pietri, Spinellis, Zacchiroli, MSR 2019 https://dx.doi.org/10.1109/MSR.2019.00030 - MSR 2020 mining competition see https://2020.msrconf.org/track/msr-2020-mining-challenge#Call-for-Papers * Milestones and breaking news ** Milestones #+INCLUDE: "../../common/modules/swh-key-dates.org::#keydates" :minlevel 3 :only-contents t ** Breaking news : archiving /public/ code #+latex: \begin{center} #+ATTR_LATEX: :width 0.7\linewidth file:codeetalab.png #+latex: \end{center} #+BEAMER: \pause https://code.etalab.gouv.fr ** Breaking news : SWHAP *** Paris Call on Software Source Code “[We call to] support efforts to gather and preserve the artifacts and narratives of the history of computing, while the earlier creators are still alive” #+BEAMER: \pause *** SWHAP : an important step forward - detailed guidelines to *curate* landmark legacy source code and *archive* it on Software Heritage - intense cooperation with *Università di Pisa* and *UNESCO* - open to all, we'll promote it worldwide *** https://www.softwareheritage.org/swhap ** Breaking news : ENEA mirror *** Thomas Jefferson, February 18, 1791 :B_block: :PROPERTIES: :BEAMER_ACT: :BEAMER_env: block :END: #+latex: {\em ...let us save what remains: not by vaults and locks which fence them from the public eye and use in consigning them to the waste of time, but by such a multiplication of copies, as shall place them beyond the reach of accident. #+latex: } #+BEAMER: \pause *** Welcoming ENEA :B_block: :PROPERTIES: :BEAMER_env: picblock :BEAMER_OPT: pic=LogoENEAcompletoENG.png, leftpic=true, width=.7\linewidth :END: - first *institutional* mirror - increased resilience - *AI infrastructure* for researchers - stepping stone to \endgraf \hfill an European joint effort * Under the hood ** The archive in pictures :noexport: #+LATEX: \centering\forcebeamerstart #+LATEX: \only<1>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_1.pdf}}} #+LATEX: \only<2>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/contents.pdf}}} #+LATEX: \only<3>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_2_contents.pdf}}} #+LATEX: \only<4>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/directories.pdf}}} #+LATEX: \only<5>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_3_directories.pdf}}} #+LATEX: \only<6>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/revisions.pdf}}} #+LATEX: \only<7>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_4_revisions.pdf}}} #+LATEX: \only<8>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/releases.pdf}}} #+LATEX: \only<9>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_5_releases.pdf}}} #+LATEX: \only<10>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/snapshots.pdf}}} #+LATEX: \forcebeamerend ** A bird's eye view :noexport: #+BEAMER: \begin{center} #+BEAMER: \includegraphics[width=\extblockscale{1.3\textwidth}]{swh-merkle-dag-wide.pdf} #+BEAMER: \end{center} * Inria's committment ** Inria's ongoing contributions *** Software Heritage - universal archive :: (research) software source code [[https://archive.softwareheritage.org/][archived and referenced]] *** Reproducibility - tools :: [[https://www.gnu.org/software/guix/][Guix]] (now [[https://www.softwareheritage.org/2019/04/18/software-heritage-and-gnu-guix-join-forces-to-enable-long-term-reproducibility/][with Software Heritage]]) - training/research :: RR workshops, MOOC *** Research software curation - HAL - SWH bridge :: curation of metadata, and [[https://hal.inria.fr/hal-01872189][deposit in Software Heritage]] * Identifiers are not easy ** URL decay disrupts the /web of reference/ #+INCLUDE: "../../common/modules/urls-decay.org::#rfc" :minlevel 3 :only-contents t #+INCLUDE: "../../common/modules/urls-decay.org::#examples" :minlevel 2 ** DOI limitations #+INCLUDE: "../../common/modules/doi-analysis.org::#doiexplained" :minlevel 3 :only-contents t * Looking for the right identifiers #+INCLUDE: "../../common/modules/swh-pids.org::#main" :only-contents t