diff --git a/talks-public/2019-07-ResearchSoftware/2019-07-ResearchSoftware.org b/talks-public/2019-07-ResearchSoftware/2019-07-ResearchSoftware.org new file mode 100644 index 0000000..94e7004 --- /dev/null +++ b/talks-public/2019-07-ResearchSoftware/2019-07-ResearchSoftware.org @@ -0,0 +1,482 @@ +#+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt) +#+TITLE: Archiving, assessing and attributing research software +#+SUBTITLE: towards software as a first class citizen in the scholarly world +# #+AUTHOR: Roberto Di Cosmo +# #+EMAIL: roberto@dicosmo.org @rdicosmo @swheritage +#+BEAMER_HEADER: \date{May 28th, 2019} +#+BEAMER_HEADER: \title[(CC-BY 4.0) Making software a first class citizen]{Archiving, assessing and attributing research software} +#+BEAMER_HEADER: \author[Roberto Di Cosmo \hfill www.softwareheritage.org]{Roberto Di Cosmo} +# #+BEAMER_HEADER: \setbeameroption{show notes on second screen} +#+BEAMER_HEADER: \setbeameroption{hide notes} +#+KEYWORDS: software heritage legacy preservation knowledge mankind technology +#+LATEX_HEADER: \usepackage{tcolorbox} +#+LATEX_HEADER: \definecolor{links}{HTML}{2A1B81} +#+LATEX_HEADER: \hypersetup{colorlinks,linkcolor=,urlcolor=links} + +# +# prelude.org contains all the information needed to export the main beamer latex source +# use prelude-toc.org to get the table of contents +# + +#+INCLUDE: "../../common/modules/prelude-toc.org" :minlevel 1 + + +#+INCLUDE: "../../common/modules/169.org" + +# +LaTeX_CLASS_OPTIONS: [aspectratio=169,handout,xcolor=table] + +#+LATEX_HEADER: \usepackage{bbding} +#+LATEX_HEADER: \DeclareUnicodeCharacter{66D}{\FiveStar} + +# +# If you want to change the title logo it's here +# +# +BEAMER_HEADER: \titlegraphic{\includegraphics[width=0.7\textwidth]{SWH-logo}} + +# aspect ratio can be changed, but the slides need to be adapted +# - compute a "resizing factor" for the images (macro for picblocks?) +# +# set the background image +# +# https://pacoup.com/2011/06/12/list-of-true-169-resolutions/ +# +#+BEAMER_HEADER: \pgfdeclareimage[height=90mm,width=160mm]{bgd}{swh-world-169.png} +#+BEAMER_HEADER: \setbeamertemplate{background}{\pgfuseimage{bgd}} +# #+LATEX: \addtocounter{framenumber}{-1} +* Software Source Code: a (forgotten) pillar of Science +** Software Source code: pillar of Open Science +*** Software is everywhere in modern research :B_picblock: + :PROPERTIES: + :BEAMER_opt: pic=papermountain, leftpic=true, width=.3\linewidth + :BEAMER_env: picblock + :BEAMER_COL: .6 + :END: +#+BEGIN_QUOTE +[...] software [...] essential in their fields. + +\mbox{}\hfill Top 100 papers (Nature, 2014) +#+END_QUOTE +#+BEGIN_QUOTE +Sometimes, if you dont have the software, you dont have the data + +\mbox{}\hfill Christine Borgman, Paris, 2018 +#+END_QUOTE +# http://www.nature.com/news/the-top-100-papers-1.16224 +#+BEAMER: \pause +*** Open Science: three pillars :B_block: + :PROPERTIES: + :BEAMER_COL: .45 + :BEAMER_env: block + :END: +#+latex: \begin{center} +#+ATTR_LATEX: :width \extblockscale{\linewidth} +file:PreservationTriangle.png +#+latex: \end{center} +#+BEAMER: \pause +*** :B_ignoreheading: + :PROPERTIES: + :BEAMER_env: ignoreheading + :END: +*** Nota bene + \hfill The links in the picture are *essential* +** Source code is /special/ +#+INCLUDE: "../../common/modules/source-code-different-short.org::#softwareisdifferent" :only-contents t :minlevel 3 +** ~ 50 years, a lightning fast growth +#+INCLUDE: "../../common/modules/50years-source-code.org::#apollolinux" :only-contents t :minlevel 3 +** Pressure to make the source code available is raising +*** Why + Necessary to + - /reproduce/ and verify, + - /modify/ and /evolve/, *building new experiments* from old ones +#+BEAMER: \pause +*** When and where + - debate started end of first 2000 decade (biology, statistics, medicine, etc.) + - growing in Computer Science since the [[https://www.artifact-eval.org/about.html][ESEC/FSE 2011 Artifact Evaluation context]] (winner: Vouillon and Di Cosmo) +** ACM take on Reproducibility, Replicability and Source code + ACM policies: [[https://www.acm.org/publications/policies/artifact-review-badging][Artifact Review and Badging]] +*** Terminology (not consensual yet!) + :PROPERTIES: + :BEAMER_col: 0.5 + :BEAMER_env: block + :END: + - *Repeatability* \\ same team, same experimental setup + - *Replicability* \\ different team, same experimental setup + - *Reproducibility* \\ different team, different experimental setup +#+BEAMER: \pause +*** Badging software artefacts + :PROPERTIES: + :BEAMER_col: 0.4 + :BEAMER_env: block + :END: +#+latex: \begin{center} + #+ATTR_LATEX: :width 0.6\linewidth +# file:file:metadata_landscape_final.png +file:acm_badges.png +#+latex: \end{center} +#+BEAMER: \pause + +** The state of the art is not ideal +#+INCLUDE: "../../common/modules/reprod-bad-sota.org::#collbergmethod" :only-contents t :minlevel 3 +** ... cont'd +#+INCLUDE: "../../common/modules/reprod-bad-sota.org::#collbergfindings" :only-contents t :minlevel 3 +#+BEAMER: \pause +*** The main reasons + \hfill source code (/or the right version of it/) cannot be found + +** Where we stand +*** Lack of recognition + :PROPERTIES: + :BEAMER_env: block + :BEAMER_COL: .5 + :END: + not (yet) a first class citizen + - in the EOSC plan +# - in the EU copyright reform + - in the scholarly works +#+BEAMER: \pause +*** Lack of proper guidance on how to + :PROPERTIES: + :BEAMER_env: block + :BEAMER_COL: .5 + :END: + - /archive/ software + - choose a license + - /cite/ a software project +# #+BEAMER: \pause +# *** :B_ignoreheading: +# :PROPERTIES: +# :BEAMER_env: ignoreheading +# :END: +# *** Lack of basic prerequisites to reproducibility +# See a discussion in \url{annex.softwareheritage.org/talks/2018/2018-09-17-STScI_public.pdf} + +*** :B_ignoreheading: + :PROPERTIES: + :BEAMER_env: ignoreheading + :END: +#+BEAMER: \pause +*** ... but a wealth of initiatives! + - Policies: ACM [[https://www.acm.org/publications/policies/artifact-review-badging][Artifact Review and Badging]], ... + - Working groups: [[https://www.force11.org/software-citation-principles][FORCE11]], [[https://www.rd-alliance.org/groups/software-source-code-ig][RDA]], [[https://www.ouvrirlascience.fr/logiciels-libres-et-open-source/][SPSO]], ... + - Metrics: [[https://www.ouvrirlascience.fr/about-the-proposal-for-software-indicators-in-open-science-monitor-3/][Open Science Monitor]] (Elsevier!), ... + - Journals: [[https://www.ipol.im/][IPOL]], ReScience, InsightJournal, eLife, ACM DL, ... + - Repositories: FigShare, Zenodo, ... +** What is at stake +*** Metadata + Research software artifacts must be properly *described*\\ + \hfill make it easy to /discover/ them (/visibility/) +#+BEAMER: \pause +*** Archival + Research software artifacts must be properly *archived*\\ + \hfill make it sure we can /retrieve/ them (/reproducibility/) +#+BEAMER: \pause +*** Identification + Research software artifacts must be properly *referenced*\\ + \hfill make it sure we can /identify/ them (/reproducibility/) +#+BEAMER: \pause +*** Citation + Research software artifacts must be properly *cited* /(not the same as referenced)/\\ + \hfill to give /credit/ to authors (/evaluation/!) +* Software Citation Working Group +** The Software Citation WG at Inria +*** Mission + - understand the landscape + - collect best practices + - identify potential Inria contributions + - make recommendations +#+BEAMER: \pause +*** First outcome + A position paper (sumbitted to CiSE), available at https://hal.archives-ouvertes.fr/hal-02135891 +*** + \hfill Here are the key highlights\hfill\mbox{} +** Literature review on research software +*** Large number of articles :B_block: + :PROPERTIES: + :BEAMER_env: block + :BEAMER_COL: .4 + :END: + - reproducibility + - preservation + - crediting developers + - principles/guidelines +#+BEAMER: \pause +*** Most common limitations :B_block: + :PROPERTIES: + :BEAMER_env: block + :BEAMER_COL: .6 + :END: + - software as 'just data' + - reduce credit to DOIs + - conflate citation and reference + - citation produced by automated tools +#+BEAMER: \pause +*** :B_ignoreheading: + :PROPERTIES: + :BEAMER_env: ignoreheading + :END: +*** A few remarkable exceptions + - [[https://www.ascl.net][ASCL]] (since 1999): metadata only, carefully curated + - [[https://www.geodynamics.org][geodynamics.org]] : source, documentation, metadata + - [[https://swmath.org][swmath.org]] : software catalog via articles +** Why it is not simple +*** Software is complex + - Structure :: monolithic/composite; self-contained/external dependencies + - Lifetime :: one-shot/long term + - Community :: one man/one team/distributed community + - Authorship :: complex set of roles /(more later)/ + - Authority :: institutions/organizations/communities/single person +#+BEAMER: \pause +*** Various granularities + - Exact status of the source code :: for reproducibility, e.g. +#+latex: \emph{``you can find at \href{https://archive.softwareheritage.org/swh:1:cnt:cdf19c4487c43c76f3612557d4dc61f9131790a4;lines=146-187/}{swh:1:cnt:cdf19c4487c43c76f3612557d4dc61f9131790a4;lines=146-187} the core algorithm used in this article''} + + - (Major) release :: \emph{``This functionality is available in OCaml version 4''} + + - Project :: \emph{``Inria has created OCaml and Scikit-Learn''}. + +** Introspection of Inria best practices +*** Three places for software + - CE :: evaluation of researchers/teams + - APP :: tech transfert + - BIL :: internal catalog +#+BEAMER: \pause +*** What we found + - ontologies + - processes + - tools +* Proposals +** Three proposals for the scholarly world +*** A refined ontology for contributors + - Design, Architecture, + - Coding, Testing, Debugging, + - Documentation, Maintenance, Support, + - Management + Similar ideas in [[https://www.casrai.org/credit.html][CRediT]], [[https://geodynamics.org/cig/metadata/?software=aspect&version=2.1.0][Geodynamics]] +#+BEAMER: \pause +*** We need the human in the loop + When /credit/ is at stake, automation/crowdsourcing is not enough\\ + \hfill humans intervention is needed to get /quality information/ +#+BEAMER: \pause +*** Reference is distinct from citation + \hfill *Reference* is for /reproducibility/, *Citation* is for /credit/: they must not be conflated. +** Inria's contribution to the future of Science +*** Software Heritage + - universal archive :: (research) software source code [[https://archive.softwareheritage.org/][archived and referenced]] +*** Reproducibility + - tools :: [[https://www.gnu.org/software/guix/][Guix]] (now [[https://www.softwareheritage.org/2019/04/18/software-heritage-and-gnu-guix-join-forces-to-enable-long-term-reproducibility/][with Software Heritage]]) + - training/research :: RR workshops, MOOC +*** Research software curation + - HAL - SWH bridge :: curation of metadata, and [[https://hal.inria.fr/hal-01872189][deposit in Software Heritage]] +* Overview of Software Heritage for Open Science +** The Software Heritage initiative \hfill www.softwareheritage.org + :PROPERTIES: + :CUSTOM_ID: mission + :END: +#+latex: \begin{center} +#+ATTR_LATEX: :width .5\linewidth +# file:SWH-logo+motto.pdf +file:SWH-logo.pdf +#+latex: \end{center} +*** Mission + \hfill *Collect*, *preserve* and *share* the /source code/ of /all + the software/ that is available +#+BEAMER: \pause +*** Partners + - Initiator :: Inria + - Industry philanthropic sponsors :: VmWare, Société Générale, Intel, GitHub ... + - Public sector :: UNESCO, DINSIC, DANS, UQAM, Bologna University +** A revolutionary infrastructure for research and innovation +*** Reference archive for research software :B_picblock: + :PROPERTIES: + :BEAMER_env: picblock + :BEAMER_OPT: pic=PreservationTriangle.png,leftpic=true, width=.4\linewidth + :END: + - *curated deposit* of research software + + /prototype/ with *HAL*, *CCSD* and *Inria IES* + - *intrinsic* identifiers for *reproducibility* + #+BEAMER: \pause +*** Reference platform for /Big Code/ :B_picblock: + :PROPERTIES: + :BEAMER_opt: pic=universal, leftpic=true, width=.2\linewidth + :BEAMER_env: picblock + :BEAMER_act: + :END: + - unique *observatory* of all software development + - *big data, machine learning* paradise: classification, trends, coding patterns, code completion... +** Highlights \hfill bit.ly/swhpaper +*** The largest software source code archive /ever/ + #+latex: \centering + #+latex: \mbox{}\hfill\includegraphics[width=\extblockscale{.35\linewidth}]{swh-dataflow-merkle.pdf}\hfill\pause + #+latex: \includegraphics[width=\extblockscale{.75\linewidth}]{2019-01-archive-growth.png}\hfill\mbox{} +#+BEAMER: \pause +*** /10 billions intrinsic/ identifiers for reproducibility :B_block: + :PROPERTIES: + :BEAMER_env: block + :BEAMER_COL: .6 + :END: + See DIO vs IDO in \hfill \url{bit.ly/swhpidpaper} + #+BEAMER: \pause +*** Research software deposit :B_block:noexport: + :PROPERTIES: + :BEAMER_env: block + :BEAMER_COL: .4 + :END: + - [[https://www.softwareheritage.org/2018/09/28/depositing-scientific-software-into-software-heritage/][moderated via *HAL*]]\\ + \hfill /open since 9/2018/ +*** Reference archive :B_block: + :PROPERTIES: + :BEAMER_env: block + :BEAMER_COL: .4 + :END: + See the work done at \hfill /swmath.org/ + #+BEAMER: \pause +*** SWH IDs now a standard for Wikidata + \mbox{}\hfill See https://www.wikidata.org/wiki/Property:P6138 + #+BEAMER: \pause +*** Collaboration HUB :B_block:noexport: + :PROPERTIES: + :BEAMER_env: block + :BEAMER_COL: .33 + :END: + - industry, research + - digital preservation +*** :B_ignoreheading: + :PROPERTIES: + :BEAMER_env: ignoreheading + :END: +*** Policy + \hfill Now part of the /French National Plan for Open Science/ \hfill\mbox{} +** Leveraging Software Heritage +*** Deposit research software \hfill /open since 9/2018/ :B_picblock: + :PROPERTIES: + :BEAMER_env: picblock + :BEAMER_OPT: pic=deposit-communication.png,width=.61\linewidth,leftpic=true + :END: +#+LATEX: \pause + *Generic mechanism (SWORD based):*\\ + - *review process*, versioning +# - /industry chimes in/ (details on demand) +#+BEAMER: \pause + - *(today)*: deposit .zip or .tar.gz file ([[http://bit.ly/swhdeposithalen][/guide/]]) + - *(tomorrow)*: provide /SWH id/ and (extract) metadata + \hfill [[https://www.softwareheritage.org/2018/09/28/depositing-scientific-software-into-software-heritage/][*click here to learn more...*]] +#+BEAMER: \pause +*** Reference archive: origins :B_block: + :PROPERTIES: + :BEAMER_env: block + :BEAMER_COL: .5 + :END: + *swMATH.org* links into Software Heritage + - e.g. [[http://swmath.org/software/7116][/the SemiPar entry in swMATH.org/]] +*** Reference archive: releases :B_block: + :PROPERTIES: + :BEAMER_env: block + :BEAMER_COL: .45 + :END: + *Wikidata* [[https://www.wikidata.org/wiki/Property:P6138][/SWH Release ID Property/]] + - e.g. [[https://www.wikidata.org/wiki/Q5533567][/the release 3.1.0 of Gensim/]] +*** :B_ignoreheading: + :PROPERTIES: + :BEAMER_env: ignoreheading + :END: + +* The road ahead +** The next steps +*** Citation for software :B_block: + :PROPERTIES: + :BEAMER_COL: .5 + :BEAMER_env: block + :END: + - complete ontology + - formal process + - guidelines +*** Reproducibility :B_block: + :PROPERTIES: + :BEAMER_env: block + :BEAMER_COL: .5 + :END: + - complete toolchain + - experimentation + - guidelines +*** :B_ignoreheading: + :PROPERTIES: + :BEAMER_env: ignoreheading + :END: +#+BEAMER: \pause +*** Adoption + - conversation started in various communities; /we are not really part of it/ + - /simplistic/ recommendations and /tools/ are being rolled out + - /directives/ are coming soon + - we *must* take part in the conversation, and disseminate our views +#+BEAMER: \pause +*** We need open tools supporting the process + \hfill Look at Software Heritage, Guix, HAL, BIL, etc. as starting blocks (to align) +** Conclusion +*** Research software :B_block: + :PROPERTIES: + :BEAMER_env: block + :BEAMER_COL: .5 + :END: + - not just data + - finally in the limelight + - soon part of /research evaluation/ +*** Doing it right is not easy :B_block: + :PROPERTIES: + :BEAMER_env: block + :BEAMER_COL: .5 + :END: + - join the conversation + - /contribute our experience/ + - make it into /best practices/ +*** :B_ignoreheading: + :PROPERTIES: + :BEAMER_env: ignoreheading + :END: +*** + \hfill Thank you! \hfill\mbox{} +* Appendix :B_appendix: + :PROPERTIES: + :BEAMER_env: appendix + :END: +** + \vfill + \centerline{\Huge Appendix} + \vfill +* Under the hood +** Coverage +#+INCLUDE: "../../common/modules/status-extended.org::#archive" :only-contents t :minlevel 3 +** Automation, and storage :maybemove: + #+BEAMER: \begin{center} + #+BEAMER: \mode{\only<1>{\includegraphics[width=\extblockscale{.9\textwidth}]{swh-dataflow-merkle-listers.pdf}}} + #+BEAMER: \only<2-3>{\includegraphics[width=\extblockscale{.9\textwidth}]{swh-dataflow-merkle.pdf}} + #+BEAMER: \end{center} + #+BEAMER: \pause + #+BEAMER: \pause + - full development history *permanently archived!* + +** The archive in pictures :noexport: + #+LATEX: \centering\forcebeamerstart + #+LATEX: \only<1>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_1.pdf}}} + #+LATEX: \only<2>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/contents.pdf}}} + #+LATEX: \only<3>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_2_contents.pdf}}} + #+LATEX: \only<4>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/directories.pdf}}} + #+LATEX: \only<5>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_3_directories.pdf}}} + #+LATEX: \only<6>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/revisions.pdf}}} + #+LATEX: \only<7>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_4_revisions.pdf}}} + #+LATEX: \only<8>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/releases.pdf}}} + #+LATEX: \only<9>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_5_releases.pdf}}} + #+LATEX: \only<10>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/snapshots.pdf}}} + #+LATEX: \forcebeamerend +** A bird's eye view :noexport: + #+BEAMER: \begin{center} + #+BEAMER: \includegraphics[width=\extblockscale{1.3\textwidth}]{swh-merkle-dag-wide.pdf} + #+BEAMER: \end{center} +* Identifiers are not easy +** URL decay disrupts the /web of reference/ + #+INCLUDE: "../../common/modules/urls-decay.org::#rfc" :minlevel 3 :only-contents t + #+INCLUDE: "../../common/modules/urls-decay.org::#examples" :minlevel 2 +** DOI limitations + #+INCLUDE: "../../common/modules/doi-analysis.org::#doiexplained" :minlevel 3 :only-contents t + +* Looking for the right identifiers +#+INCLUDE: "../../common/modules/swh-pids.org::#main" :only-contents t diff --git a/talks-public/2019-07-ResearchSoftware/METADATA b/talks-public/2019-07-ResearchSoftware/METADATA new file mode 100644 index 0000000..6ceed5e --- /dev/null +++ b/talks-public/2019-07-ResearchSoftware/METADATA @@ -0,0 +1,15 @@ + Title: + Archiving, assessing and attributing research software: first steps towards making it a first class citizen in the scholarly world + + Abstract: + Software is a fundamental pillar of modern scientific research, across + all fields and disciplines. However, there is a general lack of + adequate means to archive, reference and cite software. + In this talk, we will survey the main issues that make this task + difficult, ranging from the specificity of the persistent identifiers + needed for reproducibility to the complexity of determining software + authorship and authority, especially for long running projects, which + are needed fro proper software attribution and credit. In this talk we + report on recent contributions to the ongoing efforts to develop proper + guidelines and recommendations for software reference and software + citation, building upon the internal experience of Inria. diff --git a/talks-public/2019-07-ResearchSoftware/Makefile b/talks-public/2019-07-ResearchSoftware/Makefile new file mode 100644 index 0000000..68fbee7 --- /dev/null +++ b/talks-public/2019-07-ResearchSoftware/Makefile @@ -0,0 +1 @@ +include ../Makefile.slides