diff --git a/talks-public/2020-03-12-CodeSources/2020-03-12-CodeSources.org b/talks-public/2020-03-12-CodeSources/2020-03-12-CodeSources.org new file mode 100644 index 0000000..e4da590 --- /dev/null +++ b/talks-public/2020-03-12-CodeSources/2020-03-12-CodeSources.org @@ -0,0 +1,682 @@ +#+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt) +#+TITLE: Archiving, referencing and attributing research software +#+SUBTITLE: towards software as a first class citizen +# #+AUTHOR: Roberto Di Cosmo +# #+EMAIL: roberto@dicosmo.org @rdicosmo @swheritage +#+BEAMER_HEADER: \date{March 12, 2020} +#+BEAMER_HEADER: \title[(CC-BY 4.0) Research Software]{Archiving, referencing and attributing research software} +#+BEAMER_HEADER: \author[Roberto Di Cosmo]{Roberto Di Cosmo\\Seminaire Codes Sources, Paris} +# #+BEAMER_HEADER: \setbeameroption{show notes on second screen} +#+BEAMER_HEADER: \setbeameroption{hide notes} +#+KEYWORDS: software heritage legacy preservation knowledge mankind technology +#+LATEX_HEADER: \usepackage{tcolorbox} +#+LATEX_HEADER: \definecolor{links}{HTML}{2A1B81} +#+LATEX_HEADER: \hypersetup{colorlinks,linkcolor=,urlcolor=links} + +# +# prelude.org contains all the information needed to export the main beamer latex source +# use prelude-toc.org to get the table of contents +# + +#+INCLUDE: "../../common/modules/prelude-toc.org" :minlevel 1 + + +#+INCLUDE: "../../common/modules/169.org" + +# +LaTeX_CLASS_OPTIONS: [aspectratio=169,handout,xcolor=table] + +#+LATEX_HEADER: \usepackage{bbding} +#+LATEX_HEADER: \DeclareUnicodeCharacter{66D}{\FiveStar} + +# +# If you want to change the title logo it's here +# +# +BEAMER_HEADER: \titlegraphic{\includegraphics[width=0.7\textwidth]{SWH-logo}} + +# aspect ratio can be changed, but the slides need to be adapted +# - compute a "resizing factor" for the images (macro for picblocks?) +# +# set the background image +# +# https://pacoup.com/2011/06/12/list-of-true-169-resolutions/ +# +#+BEAMER_HEADER: \pgfdeclareimage[height=90mm,width=160mm]{bgd}{swh-world-169.png} +#+BEAMER_HEADER: \setbeamertemplate{background}{\pgfuseimage{bgd}} +#+LATEX: \addtocounter{framenumber}{-1} +* Software Source Code: a (forgotten) pillar of Science +** Software source code: a precious part of our heritage + #+INCLUDE: "../../common/modules/source-code-different-short.org::#softwareisdifferent" :only-contents t :minlevel 3 +** Source code is a /special/ and endangered heritage +*** /Executable/ and /human readable/ knowledge \hfill copyright law :noexport: + /“Programs must be written for people to read, and only incidentally for machines to execute.”/\\ + \hfill Harold Abelson +#+BEAMER: \pause +*** Software /evolves/ over time + - projects may last decades + - the /development history/ is key to its /understanding/ +#+BEAMER: \pause +*** Complexity :B_picblock: + :PROPERTIES: + :BEAMER_env: picblock + :BEAMER_OPT: pic=python3-matplotlib.pdf, width=.6\linewidth + :END: + - /millions/ of lines of code + - large /web of dependencies/ + + easy to break, difficult to maintain + - sophisticated /developer communities/ +*** :B_ignoreheading: + :PROPERTIES: + :BEAMER_env: ignoreheading + :END: +#+BEAMER: \pause +*** Precious, endangered /Executable/ and /human readable/ knowledge + key people *are passing away* ...\\ + \hfill no organised effort to catalog and archive it (more later) +** Software Source code: pillar of Open Science +*** Software is everywhere in modern research :B_picblock: + :PROPERTIES: + :BEAMER_opt: pic=papermountain, leftpic=true, width=.3\linewidth + :BEAMER_env: picblock + :BEAMER_COL: .6 + :END: +#+BEGIN_QUOTE +[...] software [...] essential in their fields. + +\mbox{}\hfill Top 100 papers (Nature, 2014) +#+END_QUOTE +#+BEGIN_QUOTE +Sometimes, if you dont have the software, you dont have the data + +\mbox{}\hfill Christine Borgman, Paris, 2018 +#+END_QUOTE +# http://www.nature.com/news/the-top-100-papers-1.16224 +#+BEAMER: \pause +*** Open Science: three pillars :B_block: + :PROPERTIES: + :BEAMER_COL: .45 + :BEAMER_env: block + :END: +#+latex: \begin{center} +#+ATTR_LATEX: :width \extblockscale{\linewidth} +file:PreservationTriangle.png +#+latex: \end{center} +#+BEAMER: \pause +*** :B_ignoreheading: + :PROPERTIES: + :BEAMER_env: ignoreheading + :END: +*** Nota bene + \hfill The links in the picture are *essential* +** The state of the art (in CS!) is far from ideal +*** ICSE (Zannier, Melrik, Maurer, 2006) + - complete absence of replication studies +*** ACM TOSEM 2001 to 2006 \hfill C. Ghezzi http://bit.ly/tosemreprod + - 60% of all papers have tools: *only 20%* /installable/ +*** Collberg's 2015 study \hfill http://reproducibility.cs.arizona.edu/ + - 601 mainstream papers: 508 with tools, *only 40%* /installable/ +#+BEAMER: \pause +*** Main reasons + \hfill source code (/or the right version of it/) cannot be found +** Where we stand +*** A wealth of initiatives! + - Policies: ACM [[https://www.acm.org/publications/policies/artifact-review-badging][Artifact Review and Badging]], ... + - Working groups: [[https://www.force11.org/software-citation-principles][FORCE11]], [[https://www.rd-alliance.org/groups/software-source-code-ig][RDA]], [[https://www.ouvrirlascience.fr/logiciels-libres-et-open-source/][SPSO]], ... + - Metrics: [[https://www.ouvrirlascience.fr/about-the-proposal-for-software-indicators-in-open-science-monitor-3/][Open Science Monitor]] (Elsevier!), ... + - Journals: [[https://www.ipol.im/][IPOL]], ReScience, InsightJournal, eLife, ACM DL, ... + - Repositories: FigShare, Zenodo, ... +#+BEAMER: \pause +*** + \hfill but ... \hfill \mbox{} +*** Lack of recognition + :PROPERTIES: + :BEAMER_env: block + :BEAMER_COL: .5 + :END: + not (yet) a first class citizen + - in the EOSC plan +# - in the EU copyright reform + - in the scholarly works +#+BEAMER: \pause +*** Lack of proper guidance on how to + :PROPERTIES: + :BEAMER_env: block + :BEAMER_COL: .5 + :END: + - /archive/ and /reference/ software + - choose a license + - /cite/ a software project +# #+BEAMER: \pause +# *** :B_ignoreheading: +# :PROPERTIES: +# :BEAMER_env: ignoreheading +# :END: +# *** Lack of basic prerequisites to reproducibility +# See a discussion in \url{annex.softwareheritage.org/talks/2018/2018-09-17-STScI_public.pdf} +** What is at stake \hfill in increasing order of difficulty +*** Archival + Research software artifacts must be properly *archived*\\ + \hfill make it sure we can /retrieve/ them (/reproducibility/) +#+BEAMER: \pause +*** Identification + Research software artifacts must be properly *referenced*\\ + \hfill make it sure we can /identify/ them (/reproducibility/) +#+BEAMER: \pause +*** Metadata + Research software artifacts must be properly *described*\\ + \hfill make it easy to /discover/ them (/visibility/) +#+BEAMER: \pause +*** Citation + Research software artifacts must be properly *cited* /(not the same as referenced!)/\\ + \hfill to give /credit/ to authors (/evaluation/!) +** A plurality of needs +*** Researcher + - archive and reference sw used in articles + - get credit for the software they develop + - verify/reproduce/improve results +*** Laboratory/team + - track software contributions + - produce up-to date report / web page +*** University/Research Organization + - central view of research software assets + - tech transfer + - impact metrics +* Software Heritage +** Software Heritage in a nutshell \hfill www.softwareheritage.org +#+BEAMER: \transdissolve +#+INCLUDE: "../../common/modules/swh-goals-oneslide-vertical.org::#goals" :only-contents t :minlevel 3 +** An international, non profit initiative\hfill built for the long term + :PROPERTIES: + :CUSTOM_ID: support + :END: +*** Sharing the vision :B_block: + :PROPERTIES: + :CUSTOM_ID: endorsement + :BEAMER_COL: .5 + :BEAMER_env: block + :END: + #+LATEX: \begin{center}{\includegraphics[width=\extblockscale{.4\linewidth}]{unesco_logo_en_285}}\end{center} + #+LATEX: \vspace{-0.8cm} + #+LATEX: \begin{center}\vskip 1em \includegraphics[width=\extblockscale{1.4\linewidth}]{support.pdf}\end{center} + #+latex: \small And many more ...\\ + #+latex:\mbox{}~~~~~~~\tiny\url{www.softwareheritage.org/support/testimonials} +*** Donors, members, sponsors :B_block: + :PROPERTIES: + :CUSTOM_ID: sponsors + :BEAMER_COL: .5 + :BEAMER_env: block + :END: + #+LATEX: \begin{center}\includegraphics[width=\extblockscale{.4\linewidth}]{inria-logo-new}\end{center} + #+LATEX: \begin{center} + # #+LATEX: \includegraphics[width=\extblockscale{.2\linewidth}]{sponsors-levels.pdf} + #+LATEX: \colorbox{white}{\includegraphics[width=\extblockscale{1.4\linewidth}]{sponsors.pdf}} + #+LATEX: \end{center} +# - sponsoring / partnership :: \hfill \url{sponsorship.softwareheritage.org} +*** :B_ignoreheading: + :PROPERTIES: + :BEAMER_env: ignoreheading + :END: +*** Research collaboration :B_picblock:noexport: + :PROPERTIES: + :BEAMER_COL: .5 + :BEAMER_env: picblock + :BEAMER_OPT: pic=Qwant_Logo, leftpic=true + :END: + source code search engine +*** See more :noexport: + \hfill\tiny\url{http:://www.softwareheritage.org/support/testimonials} +*** Global network :B_picblock:noexport: + :PROPERTIES: + :BEAMER_COL: .5 + :BEAMER_env: picblock + :BEAMER_OPT: pic=fossid, leftpic=true, width=.3\linewidth + :END: + - first *independent mirror* + - increased reliability +** The largest software archive, a shared infrastructure + #+latex: \begin{center} + #+ATTR_LATEX: :width 0.7\linewidth + file:SWH-as-foundation-slim.png + #+latex: \end{center} + #+BEAMER: \pause + #+latex: \centering + #+ATTR_LATEX: :width \extblockscale{.9\linewidth} + file:2019-09-archive-growth.png +** A peek under the hood + #+BEAMER: \begin{center} + #+BEAMER: \mode{\only<1>{\includegraphics[width=\extblockscale{1\textwidth}]{swh-dataflow-merkle-listers.pdf}}} + #+BEAMER: \only<2-3>{\includegraphics[width=\extblockscale{1\textwidth}]{swh-dataflow-merkle.pdf}} + #+BEAMER: \end{center} +#+BEAMER: \pause +#+BEAMER: \pause + /Global development history/ permanently archived in a /unique/ git-like Merkle DAG + - *~400 TB* (uncompressed) blobs, *~20 B* nodes, *~280 B* edges + # - *GitHub*, Gitlab.com, Bitbucket, /Gitorious/, /GoogleCode/, GNU, PyPi, Debian, NPM... + +** Software Heritage for Research and Innovation +*** Reference platform for /Big Code/ :B_picblock: + :PROPERTIES: + :BEAMER_opt: pic=universal, leftpic=true, width=.2\linewidth + :BEAMER_env: picblock + :BEAMER_act: + :END: + - unique *observatory* of all software development + - *big data, machine learning* paradise: classification, trends, coding patterns, code completion... +#+BEAMER: \pause +*** First datasets are available! + - full graph of software development (~20Bn nodes, ~200Bn edges) + see Pietri, Spinellis, Zacchiroli, MSR 2019 https://dx.doi.org/10.1109/MSR.2019.00030 + - MSR 2020 mining competition + see https://2020.msrconf.org/track/msr-2020-mining-challenge#Call-for-Papers +** Raising awareness about Software Source Code +*** :B_column:BMCOL: + :PROPERTIES: + :BEAMER_col: .53 + :BEAMER_env: column + :END: + #+ATTR_LATEX: :width .7\linewidth + file:UNESCOParisCallMeeting.png + UNESCO, Inria, Software Heritage invite\\ + [[https://en.unesco.org/news/experts-call-greater-recognition-software-source-code-heritage-sustainable-development][40 international experts meet in Paris]] ... + #+BEAMER: \pause +*** :B_column:BMCOL: + :PROPERTIES: + :BEAMER_col: .5 + :BEAMER_env: column + :END: + #+ATTR_LATEX: :width .65\linewidth + file:paris_call_ssc_cover.jpg + [[https://en.unesco.org/foss/paris-call-software-source-code][Their call is published on Feb 2019]] \pause +*** :B_ignoreheading: + :PROPERTIES: + :BEAMER_env: ignoreheading + :END: +*** + :PROPERTIES: + :BEAMER_COL: 1.06 + :BEAMER_env: block + :END: + It's an important /policy tool/, already referenced and used ... + \hfill /yes, you can sign it!/\\ + \vspace{10pt} + \hfill https://en.unesco.org/foss/paris-call-software-source-code \hfill\mbox{} + +* Archive and reference /all/ the source code +** Archive and reference +*** Software Heritage: a revolutionary infrastructure :B_picblock: + :PROPERTIES: + :BEAMER_env: picblock + :BEAMER_OPT: pic=PreservationTriangle.png,leftpic=true, width=.34\linewidth + :END: + - *universal archive* of all source code + + we archive /all/ software: both research and non research + + we /proactively collect software/ in a systematic way + - *intrinsic* identifiers for *reproducibility* + + identify software artefacts /without any third party/ + + cryptographically strong, compatible with git hashes +#+BEAMER: \pause +*** + Full guidelines available! \hfill \tiny https://www.softwareheritage.org/save-and-reference-research-software/ +*** + Save code now ... [[https://archive.softwareheritage.org/save/][in just a few clicks]] +*** Demo + My 2012 Parmap paper [[http://www.dicosmo.org/Publications/Parmap2012.html][before]] and [[http://www.dicosmo.org/share/parmap_swh.pdf][after]]; other links: [[https://www.softwareheritage.org/2019/07/20/archiving-and-referencing-the-apollo-source-code/][Apollo 11]] (and [[https://www.softwareheritage.org/2019/07/20/archiving-and-referencing-the-apollo-source-code/][blog]]), [[https://archive.softwareheritage.org/swh:1:cnt:bb0faf6919fc60636b2696f32ec9b3c2adb247fe;origin=https://github.com/id-Software/Quake-III-Arena;lines=548-572/][Quake III Arena]] +** The SWH-ID schema + # TODO: drawing with swh:1:cnt:xxxxxxx "exploded" and explained + #+LATEX: \centering\forcebeamerstart + #+LATEX: \only<1>{\includegraphics[width=\linewidth]{SWH-ID-1.png}} + #+LATEX: \only<2>{\includegraphics[width=\linewidth]{SWH-ID-2.png}} + #+LATEX: \only<3>{\includegraphics[width=\linewidth]{SWH-ID-3.png}} + #+LATEX: \forcebeamerend +** A worked example + #+LATEX: \centering\forcebeamerstart + #+LATEX: \only<1>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_1.pdf}}} + #+LATEX: \only<2>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/contents.pdf}}} + #+LATEX: \only<3>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_2_contents.pdf}}} + #+LATEX: \only<4>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/directories.pdf}}} + #+LATEX: \only<5>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_3_directories.pdf}}} + #+LATEX: \only<6>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/revisions.pdf}}} + #+LATEX: \only<7>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_4_revisions.pdf}}} + #+LATEX: \only<8>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/releases.pdf}}} + #+LATEX: \only<9>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_5_releases.pdf}}} + #+LATEX: \only<10>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/snapshots.pdf}}} + #+LATEX: \forcebeamerend +** Zoom on the trust model for identifiers + \vspace{-5pt} +*** Trust model for usual DOIs :B_block: + :PROPERTIES: + :BEAMER_COL: .5 + :BEAMER_env: block + :END: +#+ATTR_LATEX: :width \linewidth +file:doi-vs-pid-1.pdf +#+BEAMER: \pause +*** Trust model for DOIs with checksums :B_block: + :PROPERTIES: + :BEAMER_COL: .5 + :BEAMER_env: block + :END: +#+ATTR_LATEX: :width \linewidth +file:doi-vs-pid-2.pdf +*** :B_ignoreheading: + :PROPERTIES: + :BEAMER_env: ignoreheading + :END: +#+BEAMER: \pause +*** Trust model for SWH-IDs + :PROPERTIES: + :END: +#+ATTR_LATEX: :width .3\linewidth +file:doi-vs-pid-3.pdf +* Describe and cite /research/ source code +** Context +*** Many articles/guidelines :B_block: + :PROPERTIES: + :BEAMER_env: block + :BEAMER_COL: .4 + :END: + - reproducibility + - archival + - credit and evaluation +#+BEAMER: \pause +*** Most common limitations :B_block: + :PROPERTIES: + :BEAMER_env: block + :BEAMER_COL: .6 + :END: + - software is 'just data' + - citation = reference = DOIs + - citation produced by automated tools +#+BEAMER: \pause +*** :B_ignoreheading: + :PROPERTIES: + :BEAMER_env: ignoreheading + :END: +*** A few remarkable exceptions + - [[https://www.ascl.net][ASCL]] (since 1999): metadata only, carefully curated + - [[https://www.geodynamics.org][geodynamics.org]] : source, documentation, metadata + - [[https://swmath.org][swmath.org]] : software catalog via articles +#+BEAMER: \pause +*** Software Citation WG at Inria (since 10/2018) + - leverage a 50 year experience, make recommendations + - read more https://hal.archives-ouvertes.fr/hal-02135891 +** Why it is not simple +*** Software is complex + - Structure :: monolithic/composite; self-contained/external dependencies + - Lifetime :: one-shot/long term + - Community :: one man/one team/distributed community + - Authorship :: complex set of roles /(more later)/ + - Authority :: institutions/organizations/communities/single person +#+BEAMER: \pause +*** Various granularities + - Exact status of the source code :: for reproducibility, e.g. +#+latex: \emph{``you can find at \href{https://archive.softwareheritage.org/swh:1:cnt:cdf19c4487c43c76f3612557d4dc61f9131790a4;lines=146-187/}{swh:1:cnt:cdf19c4487c43c76f3612557d4dc61f9131790a4;lines=146-187} the core algorithm used in this article''} + + - (Major) release :: \emph{``This functionality is available in OCaml version 4''} + + - Project :: \emph{``Inria has created OCaml and Scikit-Learn''}. +** Proposals for the scholarly world +*** Refined ontology for contributors :B_block: + :PROPERTIES: + :BEAMER_COL: .55 + :BEAMER_env: block + :END: + - Design, Architecture, + - Coding, Testing, Debugging, + - Documentation, Maintenance, Support, + - Management + \hfill see also [[https://www.casrai.org/credit.html][CRediT]], [[https://geodynamics.org/cig/metadata/?software=aspect&version=2.1.0][Geodynamics]] +#+BEAMER: \pause +*** Reference is distinct from citation :B_block: + :PROPERTIES: + :BEAMER_COL: .5 + :BEAMER_env: block + :END: + - *Reference* is for /reproducibility/ + - *Citation* is for /credit/ + \hfill They must not be conflated + - Beware :: of the numbers game: + \hfill ... do we really want an /s-index/ ? +*** :B_ignoreheading: + :PROPERTIES: + :BEAMER_env: ignoreheading + :END: +#+BEAMER: \pause +*** Keep the human in the loop :B_block: + :PROPERTIES: + :BEAMER_env: block + :END: + When /credit/ is at stake, automation/crowdsourcing is not enough!\\ + \hfill Humans /are needed/ to get /quality information/ +** First steps with HAL / Software Heritage +*** How it works, what is special :B_picblock: + :PROPERTIES: + :BEAMER_env: picblock + :BEAMER_OPT: pic=deposit-communication.png,width=.4\linewidth,leftpic=true + :END: + \noindent *\hspace{1em}Generic mechanism:* + - SWORD based + - *review process* + - versioning +# - /industry chimes in/ (details on demand) +#+BEAMER: \pause + *Today*: deposit .zip or .tar.gz file ([[http://bit.ly/swhdeposithalen][/guide/]])\\ + *Tomorrow*: just provide the /SWH id/ +#+BEAMER: \pause +*** Deposit/describe research software in HAL + - author: https://hal.archives-ouvertes.fr/hal-01872189 + - moderator: https://hal.archives-ouvertes.fr/hal-01876705 +*** Examples + [[https://hal.archives-ouvertes.fr/hal-02130801][LinBox]], [[https://hal.archives-ouvertes.fr/hal-01897934][SLALOM]], [[https://hal.archives-ouvertes.fr/hal-02130729][Givaro]], [[https://hal.archives-ouvertes.fr/hal-02137040][NS2DDV]], [[https://hal.archives-ouvertes.fr/lirmm-02136558][SumGra]], [[https://hal.archives-ouvertes.fr/hal-02155786][Coq proof]], ... +** The swmath.org approach +*** Article based citation + See for example: + - [[https://swmath.org/software/7116][SemiPar on swmath.org]] +* The road ahead +** Conclusion +*** Research software :B_block: + :PROPERTIES: + :BEAMER_env: block + :BEAMER_COL: .5 + :END: + - pillar of open science +# - not just data + - finally in the limelight +*** Doing it right is not easy :B_block: + :PROPERTIES: + :BEAMER_env: block + :BEAMER_COL: .5 + :END: + - /simplistic/ approaches, "just data", ... +# - /directives/ are coming + - soon part of /research evaluation/ +*** :B_ignoreheading: + :PROPERTIES: + :BEAMER_env: ignoreheading + :END: +#+BEAMER: \pause +*** You can help make a change + - leverage Software Heritage in conferences and journals for /archival/ and /reference/ + - join the conversation on /software citation/ and /software evaluation/ criteria + - tackle the scientific problems : big code, classification, infrastructure, etc. +#+BEAMER: \pause +*** :B_ignoreheading: + :PROPERTIES: + :BEAMER_env: ignoreheading + :END: +#+latex: \hfill {\Large\bf Thank you!} \hfill\mbox{} + #+BEGIN_EXPORT latex + \begin{thebibliography}{Foo Bar, 1969} + \footnotesize + + \bibitem{Abramatic2018} Jean-François Abramatic, Roberto Di Cosmo, Stefano Zacchiroli\newblock + Building the Universal Archive of Source Code\newblock + Communications of the ACM, October 2018 + + \bibitem{DiCosmo2018} Roberto Di Cosmo, Morane Gruenpeter, Stefano Zacchiroli\newblock + Identifiers for Digital Objects: the Case of Software Source Code Preservation\newblock + iPRES 2018: Intl. Conf. on Digital Preservation + \end{thebibliography} + #+END_EXPORT +* Appendix :B_appendix: + :PROPERTIES: + :BEAMER_env: appendix + :END: +** + \vfill + \centerline{\Huge Appendix} + \vfill +* News +** Milestones :noexport: + #+INCLUDE: "../../common/modules/swh-key-dates.org::#keydates" :minlevel 3 :only-contents t +** News : archiving /public/ code + #+latex: \begin{center} + #+ATTR_LATEX: :width 0.7\linewidth + file:codeetalab.png + #+latex: \end{center} +#+BEAMER: \pause + https://code.etalab.gouv.fr +** News : SWHAP +*** Paris Call on Software Source Code + “[We call to] support efforts to gather and preserve the artifacts and + narratives of the history of computing, while the earlier creators are still + alive” +#+BEAMER: \pause +*** SWHAP : an important step forward + - detailed guidelines to *curate* landmark legacy source code + and *archive* it on Software Heritage + - intense cooperation with *Università di Pisa* and *UNESCO* + - open to all, we'll promote it worldwide +*** + https://www.softwareheritage.org/swhap +** News : ENEA mirror +*** Thomas Jefferson, February 18, 1791 :B_block: + :PROPERTIES: + :BEAMER_ACT: + :BEAMER_env: block + :END: +#+latex: {\em + ...let us save what remains: not by vaults and locks which fence them + from the public eye and use in consigning them to the waste of time, + but by such a multiplication of copies, as shall place them beyond + the reach of accident. +#+latex: } + #+BEAMER: \pause +*** Welcoming ENEA :B_block: + :PROPERTIES: + :BEAMER_env: picblock + :BEAMER_OPT: pic=LogoENEAcompletoENG.png, leftpic=true, width=.7\linewidth + :END: + - first *institutional* mirror + - increased resilience + - *AI infrastructure* for researchers + - stepping stone to \endgraf + \hfill an European joint effort + +* Inria's committment +** Inria's ongoing contributions +*** Software Heritage + - universal archive :: (research) software source code [[https://archive.softwareheritage.org/][archived and referenced]] +*** Reproducibility + - tools :: [[https://www.gnu.org/software/guix/][Guix]] (now [[https://www.softwareheritage.org/2019/04/18/software-heritage-and-gnu-guix-join-forces-to-enable-long-term-reproducibility/][with Software Heritage]]) + - training/research :: RR workshops, MOOC +*** Research software curation + - HAL - SWH bridge :: curation of metadata, and [[https://hal.inria.fr/hal-01872189][deposit in Software Heritage]] +* Big Code +** Software Heritage for Research and Innovation +*** Reference platform for /Big Code/ :B_picblock: + :PROPERTIES: + :BEAMER_opt: pic=universal, leftpic=true, width=.2\linewidth + :BEAMER_env: picblock + :BEAMER_act: + :END: + - unique *observatory* of all software development + - *big data, machine learning* paradise: classification, trends, coding patterns, code completion... +#+BEAMER: \pause +*** First datasets are available! + - full graph of software development (~20Bn nodes, ~200Bn edges) + see Pietri, Spinellis, Zacchiroli, MSR 2019 https://dx.doi.org/10.1109/MSR.2019.00030 + - MSR 2020 mining competition + see https://2020.msrconf.org/track/msr-2020-mining-challenge#Call-for-Papers +* Milestones and breaking news +** Milestones + #+INCLUDE: "../../common/modules/swh-key-dates.org::#keydates" :minlevel 3 :only-contents t +** Breaking news : archiving /public/ code + #+latex: \begin{center} + #+ATTR_LATEX: :width 0.7\linewidth + file:codeetalab.png + #+latex: \end{center} +#+BEAMER: \pause + https://code.etalab.gouv.fr +** Breaking news : SWHAP +*** Paris Call on Software Source Code + “[We call to] support efforts to gather and preserve the artifacts and + narratives of the history of computing, while the earlier creators are still + alive” +#+BEAMER: \pause +*** SWHAP : an important step forward + - detailed guidelines to *curate* landmark legacy source code + and *archive* it on Software Heritage + - intense cooperation with *Università di Pisa* and *UNESCO* + - open to all, we'll promote it worldwide +*** + https://www.softwareheritage.org/swhap +** Breaking news : ENEA mirror +*** Thomas Jefferson, February 18, 1791 :B_block: + :PROPERTIES: + :BEAMER_ACT: + :BEAMER_env: block + :END: +#+latex: {\em + ...let us save what remains: not by vaults and locks which fence them + from the public eye and use in consigning them to the waste of time, + but by such a multiplication of copies, as shall place them beyond + the reach of accident. +#+latex: } + #+BEAMER: \pause +*** Welcoming ENEA :B_block: + :PROPERTIES: + :BEAMER_env: picblock + :BEAMER_OPT: pic=LogoENEAcompletoENG.png, leftpic=true, width=.7\linewidth + :END: + - first *institutional* mirror + - increased resilience + - *AI infrastructure* for researchers + - stepping stone to \endgraf + \hfill an European joint effort + +* Under the hood +** The archive in pictures :noexport: + #+LATEX: \centering\forcebeamerstart + #+LATEX: \only<1>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_1.pdf}}} + #+LATEX: \only<2>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/contents.pdf}}} + #+LATEX: \only<3>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_2_contents.pdf}}} + #+LATEX: \only<4>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/directories.pdf}}} + #+LATEX: \only<5>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_3_directories.pdf}}} + #+LATEX: \only<6>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/revisions.pdf}}} + #+LATEX: \only<7>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_4_revisions.pdf}}} + #+LATEX: \only<8>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/releases.pdf}}} + #+LATEX: \only<9>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_5_releases.pdf}}} + #+LATEX: \only<10>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/snapshots.pdf}}} + #+LATEX: \forcebeamerend +** A bird's eye view :noexport: + #+BEAMER: \begin{center} + #+BEAMER: \includegraphics[width=\extblockscale{1.3\textwidth}]{swh-merkle-dag-wide.pdf} + #+BEAMER: \end{center} +* Inria's committment +** Inria's ongoing contributions +*** Software Heritage + - universal archive :: (research) software source code [[https://archive.softwareheritage.org/][archived and referenced]] +*** Reproducibility + - tools :: [[https://www.gnu.org/software/guix/][Guix]] (now [[https://www.softwareheritage.org/2019/04/18/software-heritage-and-gnu-guix-join-forces-to-enable-long-term-reproducibility/][with Software Heritage]]) + - training/research :: RR workshops, MOOC +*** Research software curation + - HAL - SWH bridge :: curation of metadata, and [[https://hal.inria.fr/hal-01872189][deposit in Software Heritage]] +* Identifiers are not easy +** URL decay disrupts the /web of reference/ + #+INCLUDE: "../../common/modules/urls-decay.org::#rfc" :minlevel 3 :only-contents t + #+INCLUDE: "../../common/modules/urls-decay.org::#examples" :minlevel 2 +** DOI limitations + #+INCLUDE: "../../common/modules/doi-analysis.org::#doiexplained" :minlevel 3 :only-contents t + +* Looking for the right identifiers +#+INCLUDE: "../../common/modules/swh-pids.org::#main" :only-contents t diff --git a/talks-public/2020-03-12-CodeSources/METADATA b/talks-public/2020-03-12-CodeSources/METADATA new file mode 100644 index 0000000..f6beb14 --- /dev/null +++ b/talks-public/2020-03-12-CodeSources/METADATA @@ -0,0 +1,18 @@ +Title: + + Archiving, assessing and attributing research software: towards + software as a first class citizen in the scholarly world + + Abstract: + + Software is a fundamental pillar of modern scientific research, across all + fields and disciplines. However, there is a general lack of adequate means to + archive, reference and cite software. In this talk, we will survey the main + issues that make this task difficult, ranging from the specificity of the + persistent identifiers needed for reproducibility to the complexity of + determining software authorship and authority, especially for long running + projects, which are needed for proper software attribution and credit. We + report on recent contributions to the ongoing efforts to develop proper + processes, guidelines and recommendations for software reference and + software citation, building upon the internal experience of Inria and the + emerging Software Heritage infrastructure. diff --git a/talks-public/2020-03-12-CodeSources/Makefile b/talks-public/2020-03-12-CodeSources/Makefile new file mode 100644 index 0000000..68fbee7 --- /dev/null +++ b/talks-public/2020-03-12-CodeSources/Makefile @@ -0,0 +1 @@ +include ../Makefile.slides