diff --git a/talks-public/2020-02-18-IDCC-15th/2020-02-18-IDCC-15th.org b/talks-public/2020-02-18-IDCC-15th/2020-02-18-IDCC-15th.org index f56abdd..cbeebc6 100644 --- a/talks-public/2020-02-18-IDCC-15th/2020-02-18-IDCC-15th.org +++ b/talks-public/2020-02-18-IDCC-15th/2020-02-18-IDCC-15th.org @@ -1,228 +1,334 @@ #+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt) #+TITLE: Curated Archiving of Research Software Artifacts: #+SUBTITLE: lessons learned from the French open archive (HAL) #+AUTHOR: Roberto Di Cosmo, Morane Gruenpeter, Bruno Marmol, Alain Monteil, Laurent Romary, Jozefina Sadowska #+EMAIL: morane@softwareheritage.org @moraneottilia @swheritage #+BEAMER_HEADER: \date[February 18th, 2020]{February 18th, 2020\\[-1em]} #+BEAMER_HEADER: \title[Curated Archiving of Research Software Artifacts]{Curated Archiving of Research Software Artifacts:} #+BEAMER_HEADER: \author[Di Cosmo, {\bf Gruenpeter}, Marmol, Monteil, Romary, Sadowska]{Roberto Di Cosmo, {\bf Morane Gruenpeter}, Bruno Marmol,\\ Alain Monteil, Laurent Romary, Jozefina Sadowska\\[1em]} #+KEYWORDS: software heritage legacy preservation knowledge mankind technology #+LATEX_HEADER: \usepackage{tcolorbox} #+LATEX_HEADER: \definecolor{links}{HTML}{2A1B81} #+LATEX_HEADER: \hypersetup{colorlinks,linkcolor=,urlcolor=links} # # prelude.org contains all the information needed to export the main beamer latex source # use prelude-toc.org to get the table of contents # #+INCLUDE: "../../common/modules/prelude-toc.org" :minlevel 1 #+INCLUDE: "../../common/modules/169.org" # +LaTeX_CLASS_OPTIONS: [aspectratio=169,handout,xcolor=table] #+LATEX_HEADER: \usepackage{bbding} #+LATEX_HEADER: \usepackage{tcolorbox} #+LATEX_HEADER: \DeclareUnicodeCharacter{66D}{\FiveStar} # # If you want to change the title logo it's here # #+BEAMER_HEADER: \titlegraphic{\includegraphics[width=0.5\textwidth]{Inria-HAL-CCSD-SWH-logo-horizontal.png}} # aspect ratio can be changed, but the slides need to be adapted # - compute a "resizing factor" for the images (macro for picblocks?) # # set the background image # # https://pacoup.com/2011/06/12/list-of-true-169-resolutions/ # #+BEAMER_HEADER: \pgfdeclareimage[height=90mm,width=160mm]{bgd}{swh-world-169.png} #+BEAMER_HEADER: \setbeamertemplate{background}{\pgfuseimage{bgd}} #+LATEX: \addtocounter{framenumber}{-1} * Introduction- Software is our heritage ** Source Code: /executable/ and /human readable/ knowledge #+INCLUDE: "../../common/modules/source-code-different-short.org::#thesourcecode" :only-contents t :minlevel 3 -*** - Len Shustek, CHM\hfill /“Source code provides a *view* into the mind of the designer.”/ -** The Paris call: Software Source Code is part of our Heritage - #+INCLUDE: "../../common/modules/paris-call-2019.org::#pariscall2019" :only-contents t :minlevel 3 +** Software is a /forgotten/ pillar of Open Science +*** Lack of recognition + :PROPERTIES: + :BEAMER_env: block + :BEAMER_col: 0.48 + :END: + not (yet) a first class citizen + - in the EOSC plan + - in the scholarly world + +#+BEAMER: \pause + #+BEGIN_QUOTE + Sometimes, if you dont have the software, you dont have the data + + \mbox{}\hfill Christine Borgman, Paris, 2018 + #+END_QUOTE + +#+BEAMER: \pause +*** + :PROPERTIES: + :BEAMER_COL: .5 + :END: + #+latex: \begin{center} + #+ATTR_LATEX: :width 0.9\linewidth + file:preservation_triangle_color.png + #+latex: \end{center} +#+BEAMER: \pause +*** Reproducibility is the key :B_picblock: + :PROPERTIES: + :BEAMER_opt: pic=Karl_Popper, leftpic=true, width=.16\linewidth + :BEAMER_env: picblock + :END: +#+latex: \begin{quote} + non-reproducible single occurrences are of no significance to science\\ + \\ + \mbox{} \hfill \scriptsize Karl Popper, \emph{The Logic of Scientific Discovery}, 1934 +#+latex: \end{quote} + +** What is at stake \hfill in increasing order of difficulty +\vspace{-7pt} +*** Archival + Research software artifacts must be properly *archived*\\ + \hfill make it sure we can /retrieve/ them (/reproducibility/) +#+BEAMER: \pause +*** Identification + Research software artifacts must be properly *referenced*\\ + \hfill make it sure we can /identify/ them (/reproducibility/) +#+BEAMER: \pause +*** Metadata + Research software artifacts must be properly *described*\\ + \hfill make it easy to /discover/ them (/visibility/) +#+BEAMER: \pause +*** Citation + Research software artifacts must be properly *cited* /(not the same as referenced!)/\\ + \hfill to give /credit/ to authors (/evaluation/!) +#+BEAMER: \pause -#+INCLUDE: "../../common/modules/swh-goals-oneslide-vertical.org::#goals" :minlevel 2 * The software deposit- a first class research output # reproducibility and scientific knowledge pillars (one slide) #+INCLUDE: "../../common/modules/swh-scientific-reproducibility.org::#main" :only-contents t :minlevel 2 # deposit-communication-with-PID.png ** The software deposit workflow *** Collaboration - Center for Direct Scientific Communication (*CCSD*) - behind the *HAL* platform - the French National Institute for computer science and applied mathematics (*Inria*) - Software Heritage - The largest library of *software source code* #+BEAMER: \pause *** A complete workflow with three major steps: 1. *depositing* software source code on HAL’s platform 2. *moderating* and curating the deposit by a certified IES-Inria moderator 3. *sharing* the deposit and pushing the deposit to the SWH archive # scientific software (deposit) use-case (one slide) #+INCLUDE: "../../common/modules/swh-scientific-deposit.org::#main" :only-contents t :minlevel 2 ** Submit your source code \hfill \href{}{deposit guide} #+latex: \begin{center} #+ATTR_LATEX: :width \linewidth file:HAL-form-IDCC.png #+latex: \end{center} ** Reference vs. citation +*** Credit & Attribution + :PROPERTIES: + :BEAMER_col: 0.33 + :BEAMER_env: block + :END: + - a metadata record + - all authors & contributors +#+BEAMER: \pause + +*** Reuse & Reproducibility + :PROPERTIES: + :BEAMER_col: 0.33 + :BEAMER_env: block + :END: + - a specific artifact + - with complementary information (docs) +#+BEAMER: \pause + +*** Archive & Index + :PROPERTIES: + :BEAMER_col: 0.33 + :BEAMER_env: block + :END: + - metadata record (HAL) + - artifact itslef (SWH) + \hfill connect the dots... + +#+BEAMER: \pause +*** :B_ignoreheading: + :PROPERTIES: + :BEAMER_env: ignoreheading + :END: #+latex: \begin{center} #+ATTR_LATEX: :width 0.7\linewidth file:citation-format-IDCC.png #+latex: \end{center} ** The deposit view #+latex: \begin{center} #+ATTR_LATEX: :width 0.7\linewidth file:HAL_deposit.png #+latex: \end{center} * Keeping the human in the loop- metadata moderation ** Software deposit moderation *** we need - quality metadata to describe research software - correct credit to all authors of the software #+BEAMER: \pause *** Main actions the digital archivist performs: - detecting extraneous or abusive content (illegal or harassing), - verifying consistency between the metadata and the software source code itself, - completing or correcting the deposit metadata if needed. #+BEAMER: \pause *** Out of scope - review source code functionality - compile & run software - assess reproducibility & accuracy ** The moderation workflow #+latex: \begin{center} #+ATTR_LATEX: :width 0.62\linewidth file:moderation-workflow.png #+latex: \end{center} ** Publishing vs Sharing *** Publishing :PROPERTIES: :BEAMER_col: 0.48 :BEAMER_env: block :END: - an academic publication is a research result that has been qualified through some form of *peer review* - *software review* examples: AEC, IPOL, the Journal of Open Source Software #+BEAMER: \pause *** Sharing :PROPERTIES: :BEAMER_col: 0.48 :BEAMER_env: block :END: - vast majority developed outside of academia - code hosting platforms like GitHub, GitLab, and many more - institutional repositories or archives (HAL, Zenodo, SWH, etc..) #+BEAMER: \pause -*** + +*** :B_ignoreheading: :PROPERTIES: :BEAMER_env: ignoreheading :END: - +#+latex: \vspace{+15pt} \hfill We do not indicate HAL or Software Heritage as a publisher. * Conclusion ** Lessons learned *** The importance of a software license - can software be deposited without a license? #+BEAMER: \pause \hfill became a *mandatory* field on HAL #+BEAMER: \pause *** Collective authorship - can the X project team be the author of software? #+BEAMER: \pause \hfill authorship can be established only with a *clear link* between a /person and a deposit/ #+BEAMER: \pause *** Legacy software - should be archived in its original state - where to put additional information? #+BEAMER: \pause create source code *container* to capture both /original/ and /added information/ as detailed in the \href{www.softwareheritage.org/swhap}{legacy software acquisition process} ** Lessons learned (continued..) *** research experiments - deposit on HAL or just archive repository on SWH? #+BEAMER: \pause \hfill depends on the *life span* of the experiment #+BEAMER: \pause *** software with large datasets - include in software deposit or separate? #+BEAMER: \pause \hfill depends on *dataset nature* and *reuse possibilities* #+BEAMER: \pause *** Software collections :B_picblock: :PROPERTIES: :BEAMER_env: picblock :BEAMER_OPT: pic=python3-matplotlib.pdf, width=.5\linewidth, leftpic=true :END: - Research Software does not exist in isolation - large /web of dependencies/ on non-research software - single or multiple deposits ? #+BEAMER: \pause \hfill depends on *reuse possibilities* ** Next steps -TODO +*** Export formats + - improve BibTex export (contribute to the [[https://gitlab.inria.fr/gt-sw-citation/bibtex-sw-entry/][@software bibtex proposal]]) + - improve other existing formats (TEI, endnote, DC, DCterms) + - create CodeMeta and CFF exports of metadata +#+BEAMER: \pause +*** Create deposit from existing repository (on GitHub, GitLab, etc..) + - using an existing SWH-ID + - using a repository url +#+BEAMER: \pause +*** Integrate software into HAL Data + - on \url{https://data.archives-ouvertes.fr/} + - a SPARQL endpoint, using RDF ** Come in, we're open! *** This work is partially supported by the FAIRsFAIR European project. - \url{www.softwareheritage.org} --- learn more \\ - contribute to the [[https://gitlab.inria.fr/gt-sw-citation/bibtex-sw-entry/][@software bibtex proposal]] - \url{www.softwareheritage.org/swhap} --- legacy software acquisition process \\ + #+BEAMER: \vspace{-1mm} \flushright {\Huge Questions?} \vfill -*** References :B_block: +*** :PROPERTIES: - :BEAMER_env: block + :BEAMER_env: ignoreheading :END: #+BEGIN_EXPORT latex \begin{thebibliography}{Foo Bar, 1969} \footnotesize - \bibitem{Abramatic2018} Jean-François Abramatic, Roberto Di Cosmo, Stefano Zacchiroli\newblock - \emph{Building the Universal Archive of Source Code},\\ - Communications of the ACM, October 2018 - \href{https://doi.org/10.1145/3183558}{(10.1145/3183558)} - \bibitem{DiCosmo2019} Roberto Di Cosmo, Morane Gruenpeter, Stefano Zacchiroli\newblock + + \bibitem{DiCosmo2019} R. Di Cosmo, M. Gruenpeter, S. Zacchiroli\newblock \emph{Referencing Source Code Artifacts: a Separate Concern in Software Citation},\\ - Computing in Science and Engineering, IEEE, pp.1-9. \href{https://dx.doi.org/10.1109/MCSE.2019.2963148}{(10.1109/MCSE.2019.2963148)} - \href{https://hal.archives-ouvertes.fr/hal-02446202}{(hal-02446202)} \end{thebibliography} - #+END_EXPORT + CiSE, IEEE, pp.1-9. 2020. \href{https://dx.doi.org/10.1109/MCSE.2019.2963148}{(10.1109/MCSE.2019.2963148)} + \href{https://hal.archives-ouvertes.fr/hal-02446202}{(hal-02446202)} + \bibitem{DiCosmo2020} R. Cosmo, M. Gruenpeter, B. Marmol, A. Monteil, L. Romary, J. Sadowska\newblock + \emph{Curated Archiving of Research Software Artifacts: lessons learned from the French open archive},\\ + submitted to IJDC. December 2019. + \href{https://hal.archives-ouvertes.fr/hal-02475835}{(hal-02475835)} + + \bibitem{Alliez2020} P. Alliez, R. Di Cosmo, B. Guedj, A. Girault, M.-S. Hacid, A. Legrand, N. Rougier\newblock + \emph{Attributing and Referencing (Research) Software: Best Practices and Outlook From Inria Journal Article},\\ + Computing in Science Engineering, 22 (1), pp. 39-52, 2020, ISSN: 1558-366X. + \href{https://dx.doi.org/10.1109/MCSE.2019.2949413}{(10.1109/MCSE.2019.294941)} + \href{https://hal.archives-ouvertes.fr/hal-02135891}{(hal-02135891)} + + \end{thebibliography} + #+END_EXPORT +# \bibitem{Abramatic2018} Jean-François Abramatic, Roberto Di Cosmo, Stefano Zacchiroli\newblock +# \emph{Building the Universal Archive of Source Code},\\ +# Communications of the ACM, October 2018 +# \href{https://doi.org/10.1145/3183558}{(10.1145/3183558)}