diff --git a/common/images/deposit-communication-with-PID.png b/common/images/deposit-communication-with-PID.png index af41295..0554634 100644 Binary files a/common/images/deposit-communication-with-PID.png and b/common/images/deposit-communication-with-PID.png differ diff --git a/talks-public/2020-02-18-IDCC-15th/2020-02-18-IDCC-15th.org b/talks-public/2020-02-18-IDCC-15th/2020-02-18-IDCC-15th.org index cbeebc6..ff9e772 100644 --- a/talks-public/2020-02-18-IDCC-15th/2020-02-18-IDCC-15th.org +++ b/talks-public/2020-02-18-IDCC-15th/2020-02-18-IDCC-15th.org @@ -1,334 +1,390 @@ #+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt) #+TITLE: Curated Archiving of Research Software Artifacts: #+SUBTITLE: lessons learned from the French open archive (HAL) #+AUTHOR: Roberto Di Cosmo, Morane Gruenpeter, Bruno Marmol, Alain Monteil, Laurent Romary, Jozefina Sadowska #+EMAIL: morane@softwareheritage.org @moraneottilia @swheritage #+BEAMER_HEADER: \date[February 18th, 2020]{February 18th, 2020\\[-1em]} #+BEAMER_HEADER: \title[Curated Archiving of Research Software Artifacts]{Curated Archiving of Research Software Artifacts:} #+BEAMER_HEADER: \author[Di Cosmo, {\bf Gruenpeter}, Marmol, Monteil, Romary, Sadowska]{Roberto Di Cosmo, {\bf Morane Gruenpeter}, Bruno Marmol,\\ Alain Monteil, Laurent Romary, Jozefina Sadowska\\[1em]} #+KEYWORDS: software heritage legacy preservation knowledge mankind technology #+LATEX_HEADER: \usepackage{tcolorbox} #+LATEX_HEADER: \definecolor{links}{HTML}{2A1B81} #+LATEX_HEADER: \hypersetup{colorlinks,linkcolor=,urlcolor=links} # # prelude.org contains all the information needed to export the main beamer latex source # use prelude-toc.org to get the table of contents # #+INCLUDE: "../../common/modules/prelude-toc.org" :minlevel 1 #+INCLUDE: "../../common/modules/169.org" # +LaTeX_CLASS_OPTIONS: [aspectratio=169,handout,xcolor=table] #+LATEX_HEADER: \usepackage{bbding} #+LATEX_HEADER: \usepackage{tcolorbox} #+LATEX_HEADER: \DeclareUnicodeCharacter{66D}{\FiveStar} # # If you want to change the title logo it's here # #+BEAMER_HEADER: \titlegraphic{\includegraphics[width=0.5\textwidth]{Inria-HAL-CCSD-SWH-logo-horizontal.png}} # aspect ratio can be changed, but the slides need to be adapted # - compute a "resizing factor" for the images (macro for picblocks?) # # set the background image # # https://pacoup.com/2011/06/12/list-of-true-169-resolutions/ # #+BEAMER_HEADER: \pgfdeclareimage[height=90mm,width=160mm]{bgd}{swh-world-169.png} #+BEAMER_HEADER: \setbeamertemplate{background}{\pgfuseimage{bgd}} #+LATEX: \addtocounter{framenumber}{-1} * Introduction- Software is our heritage ** Source Code: /executable/ and /human readable/ knowledge #+INCLUDE: "../../common/modules/source-code-different-short.org::#thesourcecode" :only-contents t :minlevel 3 ** Software is a /forgotten/ pillar of Open Science *** Lack of recognition :PROPERTIES: :BEAMER_env: block :BEAMER_col: 0.48 :END: not (yet) a first class citizen - in the EOSC plan - in the scholarly world #+BEAMER: \pause #+BEGIN_QUOTE Sometimes, if you dont have the software, you dont have the data \mbox{}\hfill Christine Borgman, Paris, 2018 #+END_QUOTE #+BEAMER: \pause *** :PROPERTIES: :BEAMER_COL: .5 :END: #+latex: \begin{center} #+ATTR_LATEX: :width 0.9\linewidth file:preservation_triangle_color.png #+latex: \end{center} #+BEAMER: \pause *** Reproducibility is the key :B_picblock: :PROPERTIES: :BEAMER_opt: pic=Karl_Popper, leftpic=true, width=.16\linewidth :BEAMER_env: picblock :END: #+latex: \begin{quote} non-reproducible single occurrences are of no significance to science\\ \\ \mbox{} \hfill \scriptsize Karl Popper, \emph{The Logic of Scientific Discovery}, 1934 #+latex: \end{quote} ** What is at stake \hfill in increasing order of difficulty \vspace{-7pt} *** Archival Research software artifacts must be properly *archived*\\ \hfill make it sure we can /retrieve/ them (/reproducibility/) #+BEAMER: \pause *** Identification Research software artifacts must be properly *referenced*\\ \hfill make it sure we can /identify/ them (/reproducibility/) #+BEAMER: \pause *** Metadata Research software artifacts must be properly *described*\\ \hfill make it easy to /discover/ them (/visibility/) #+BEAMER: \pause *** Citation Research software artifacts must be properly *cited* /(not the same as referenced!)/\\ \hfill to give /credit/ to authors (/evaluation/!) #+BEAMER: \pause * The software deposit- a first class research output -# reproducibility and scientific knowledge pillars (one slide) -#+INCLUDE: "../../common/modules/swh-scientific-reproducibility.org::#main" :only-contents t :minlevel 2 -# deposit-communication-with-PID.png - -** The software deposit workflow -*** Collaboration - - Center for Direct Scientific Communication (*CCSD*) - behind the *HAL* platform - - the French National Institute for computer science and applied mathematics (*Inria*) - - Software Heritage - The largest library of *software source code* + +** Making software a first class research output +*** CCSD + :PROPERTIES: + :BEAMER_col: 0.33 + :BEAMER_env: block + :END: + Center for Direct Scientific Communication - behind the *HAL* platform + - \tiny{Hyper articles en ligne} #+BEAMER: \pause -*** A complete workflow with three major steps: - 1. *depositing* software source code on HAL’s platform - 2. *moderating* and curating the deposit by a certified IES-Inria moderator - 3. *sharing* the deposit and pushing the deposit to the SWH archive -# scientific software (deposit) use-case (one slide) -#+INCLUDE: "../../common/modules/swh-scientific-deposit.org::#main" :only-contents t :minlevel 2 +*** IES-Inria + :PROPERTIES: + :BEAMER_col: 0.33 + :BEAMER_env: block + :END: + Scientific information & publishing service *@Inria* + - \tiny{Institut National de Recherche en Informatique et en Automatique} + +#+BEAMER: \pause -** Submit your source code \hfill \href{}{deposit guide} +*** Software Heritage + :PROPERTIES: + :BEAMER_col: 0.33 + :BEAMER_env: block + :END: + Building the *SWH* universal archive for all /software source code/ + - \tiny{With the support of UNESCO} + +#+BEAMER: \pause +*** Goals + 1. *archive* software source code on HAL and on SWH + 2. *identify* all the contained artifacts in a deposit with the /SWH-ID/ + 3. *describe* with reviewed metadata by an /IES-Inria moderator/ + 4. *cite* the deposit with a complete citation + +** The research software (deposit) use case + :PROPERTIES: + :CUSTOM_ID: hal + :END: +*** the deposit workflow + :PROPERTIES: + :BEAMER_COL: .5 + :END: + #+latex: \begin{center} + #+ATTR_LATEX: :width \linewidth + file:deposit-communication-with-PID.png + #+latex: \end{center} +#+LATEX: \pause + +*** Deposit software in HAL \hfill [[http://hal.inria.fr/hal-01738741][poster]] :B_picblock: + :PROPERTIES: + :BEAMER_COL: .5 + :BEAMER_env: block + :END: + *\hspace{1em}Generic mechanism:* + - SWORD based + - review process + - versioning + +#+BEAMER: \pause + *\hspace{1em} How to do it:* \hfill ([[http://bit.ly/swhdeposithalen][/guide/]]) + - deposit .zip or .tar.gz file with metadata + +#+BEAMER: \pause + *\hspace{1em} Timeline:* + - /Mars 2018/: test phase on *HAL-Inria* + - /September 2018/: open to all *HAL* + - /December 2019/: + - 80 complete source code deposits + - 98 software records + +** Submit your source code \hfill ([[http://bit.ly/swhdeposithalen][/guide/]]) #+latex: \begin{center} #+ATTR_LATEX: :width \linewidth file:HAL-form-IDCC.png #+latex: \end{center} +** The deposit view +#+latex: \begin{center} +#+ATTR_LATEX: :width 0.7\linewidth +file:HAL_deposit.png +#+latex: \end{center} + ** Reference vs. citation *** Credit & Attribution :PROPERTIES: :BEAMER_col: 0.33 :BEAMER_env: block :END: - a metadata record - all authors & contributors #+BEAMER: \pause *** Reuse & Reproducibility :PROPERTIES: :BEAMER_col: 0.33 :BEAMER_env: block :END: - a specific artifact - with complementary information (docs) #+BEAMER: \pause *** Archive & Index :PROPERTIES: :BEAMER_col: 0.33 :BEAMER_env: block :END: - metadata record (HAL) - artifact itslef (SWH) \hfill connect the dots... #+BEAMER: \pause *** :B_ignoreheading: :PROPERTIES: :BEAMER_env: ignoreheading :END: #+latex: \begin{center} #+ATTR_LATEX: :width 0.7\linewidth file:citation-format-IDCC.png #+latex: \end{center} -** The deposit view -#+latex: \begin{center} -#+ATTR_LATEX: :width 0.7\linewidth -file:HAL_deposit.png -#+latex: \end{center} + * Keeping the human in the loop- metadata moderation ** Software deposit moderation *** we need - - quality metadata to describe research software - - correct credit to all authors of the software + - *quality metadata* to describe research software + - *correct credit* to all authors of the software #+BEAMER: \pause *** Main actions the digital archivist performs: - detecting extraneous or abusive content (illegal or harassing), - verifying consistency between the metadata and the software source code itself, - completing or correcting the deposit metadata if needed. #+BEAMER: \pause *** Out of scope - review source code functionality - compile & run software - assess reproducibility & accuracy ** The moderation workflow #+latex: \begin{center} #+ATTR_LATEX: :width 0.62\linewidth file:moderation-workflow.png #+latex: \end{center} ** Publishing vs Sharing *** Publishing :PROPERTIES: :BEAMER_col: 0.48 :BEAMER_env: block :END: - an academic publication is a research result that has been qualified through some form of *peer review* - *software review* examples: AEC, IPOL, the Journal of Open Source Software #+BEAMER: \pause *** Sharing :PROPERTIES: :BEAMER_col: 0.48 :BEAMER_env: block :END: - vast majority developed outside of academia - code hosting platforms like GitHub, GitLab, and many more - institutional repositories or archives (HAL, Zenodo, SWH, etc..) #+BEAMER: \pause *** :B_ignoreheading: :PROPERTIES: :BEAMER_env: ignoreheading :END: #+latex: \vspace{+15pt} \hfill We do not indicate HAL or Software Heritage as a publisher. * Conclusion ** Lessons learned *** The importance of a software license - can software be deposited without a license? #+BEAMER: \pause \hfill became a *mandatory* field on HAL #+BEAMER: \pause *** Collective authorship - can the X project team be the author of software? #+BEAMER: \pause \hfill authorship can be established only with a *clear link* between a /person and a deposit/ #+BEAMER: \pause *** Legacy software - should be archived in its original state - where to put additional information? #+BEAMER: \pause create source code *container* to capture both /original/ and /added information/ as detailed in the - \href{www.softwareheritage.org/swhap}{legacy software acquisition process} + \href{www.softwareheritage.org/swhap}{legacy software acquisition process (SWHAP)} ** Lessons learned (continued..) *** research experiments - deposit on HAL or just archive repository on SWH? #+BEAMER: \pause \hfill depends on the *life span* of the experiment #+BEAMER: \pause *** software with large datasets - include in software deposit or separate? #+BEAMER: \pause \hfill depends on *dataset nature* and *reuse possibilities* #+BEAMER: \pause *** Software collections :B_picblock: :PROPERTIES: :BEAMER_env: picblock :BEAMER_OPT: pic=python3-matplotlib.pdf, width=.5\linewidth, leftpic=true :END: - Research Software does not exist in isolation - large /web of dependencies/ on non-research software - single or multiple deposits ? #+BEAMER: \pause \hfill depends on *reuse possibilities* ** Next steps *** Export formats - improve BibTex export (contribute to the [[https://gitlab.inria.fr/gt-sw-citation/bibtex-sw-entry/][@software bibtex proposal]]) - improve other existing formats (TEI, endnote, DC, DCterms) - create CodeMeta and CFF exports of metadata #+BEAMER: \pause -*** Create deposit from existing repository (on GitHub, GitLab, etc..) +*** Create deposit from existing repository - using an existing SWH-ID - - using a repository url + - using a repository url (on GitHub, GitLab, etc..) #+BEAMER: \pause *** Integrate software into HAL Data - on \url{https://data.archives-ouvertes.fr/} - a SPARQL endpoint, using RDF ** Come in, we're open! *** This work is partially supported by the FAIRsFAIR European project. #+BEAMER: \vspace{-1mm} \flushright {\Huge Questions?} \vfill *** :PROPERTIES: :BEAMER_env: ignoreheading :END: #+BEGIN_EXPORT latex \begin{thebibliography}{Foo Bar, 1969} \footnotesize \bibitem{DiCosmo2019} R. Di Cosmo, M. Gruenpeter, S. Zacchiroli\newblock \emph{Referencing Source Code Artifacts: a Separate Concern in Software Citation},\\ CiSE, IEEE, pp.1-9. 2020. \href{https://dx.doi.org/10.1109/MCSE.2019.2963148}{(10.1109/MCSE.2019.2963148)} \href{https://hal.archives-ouvertes.fr/hal-02446202}{(hal-02446202)} \bibitem{DiCosmo2020} R. Cosmo, M. Gruenpeter, B. Marmol, A. Monteil, L. Romary, J. Sadowska\newblock \emph{Curated Archiving of Research Software Artifacts: lessons learned from the French open archive},\\ submitted to IJDC. December 2019. \href{https://hal.archives-ouvertes.fr/hal-02475835}{(hal-02475835)} \bibitem{Alliez2020} P. Alliez, R. Di Cosmo, B. Guedj, A. Girault, M.-S. Hacid, A. Legrand, N. Rougier\newblock \emph{Attributing and Referencing (Research) Software: Best Practices and Outlook From Inria Journal Article},\\ Computing in Science Engineering, 22 (1), pp. 39-52, 2020, ISSN: 1558-366X. \href{https://dx.doi.org/10.1109/MCSE.2019.2949413}{(10.1109/MCSE.2019.294941)} \href{https://hal.archives-ouvertes.fr/hal-02135891}{(hal-02135891)} \end{thebibliography} #+END_EXPORT # \bibitem{Abramatic2018} Jean-François Abramatic, Roberto Di Cosmo, Stefano Zacchiroli\newblock # \emph{Building the Universal Archive of Source Code},\\ # Communications of the ACM, October 2018 # \href{https://doi.org/10.1145/3183558}{(10.1145/3183558)} diff --git a/talks-public/2020-02-18-IDCC-15th/METADATA b/talks-public/2020-02-18-IDCC-15th/METADATA index 600ef0f..de8d96c 100644 --- a/talks-public/2020-02-18-IDCC-15th/METADATA +++ b/talks-public/2020-02-18-IDCC-15th/METADATA @@ -1,33 +1,25 @@ -Title: The swh-id: a digital fingerprint identifying software source code +Title: Curated Archiving of Research Software Artifacts: +lessons learned from the French open archive (HAL) - - Abstract: - - The Software Heritage universal archive of software source code relies on - well established techniques used in software development communities to - identify the over 20 billion code artefacts it preserves - cryptographic hashes in a Merkle DAG data structure. - - In this session we will first explain the motivations of this choice, - recalling Paskin's essential distinction between digital identifiers of - an object (DIOs) and identifiers of digital objects (IDOs). +Authors: Roberto Di Cosmo, Alain Monteil,Morane Gruenpeter, Laurent Romary +Bruno Marmol, Jozefina Sadowska - Then we will focus on the properties of the Software Heritage Identifiers - (SWH-IDs) that matter most in a reproducibility and long term archival framework: - intrinsic integrity and independent verifiability. - Finally, we will show practically how they can be used to improve current - research publication practices. - - How would you run the session to support the spirit of PIDapalooza as a laid-back, - welcoming, energetic and exciting meeting, and ensure at least 10 minutes of - your session are used to interact with the audience? - - We will do a live demonstration of the swh-identify module that can extract - the PID from the digital artefact. - Also we will show how to resolve an swh-id on the online archive and how - to find a swh-id of a preserved artefact. + + Abstract: - Finally, we will invite participants that want to preserve their repositories - or important repositories to submit the code with Software Heritage's - "save code now" feature. + Software has become an indissociable support of technical and scientific knowledge. + The preservation of this universal body of knowledge is as essential as preserving + research articles and data sets. In the quest to make scientific results reproducible, and + pass knowledge to future generations, we must preserve these three main pillars: + research articles that describe the results, the data sets used or produced, and the + software that embodies the logic of the data transformation. + + The collaboration between Software Heritage 1 (SWH), the Center for Direct Scientific + Communication (CCSD) 2 and the scientific and technical information services (IES 3 ) of + The French Institute for Research in Computer Science and Automation (Inria) 4 has + resulted in a specified moderation and curation workflow for research software artifacts + deposited in the HAL 5 open access repository. The curation workflow was developed to + help digital librarians and archivists handle this new and peculiar artifact - software + source code. While implementing the workflow, a set of guidelines has emerged from + the challenges and the solutions put in place to help all actors involved in the process.