diff --git a/talks-public/2018-03-22-RDA-IG/2018-03-22-RDA-IG.org b/talks-public/2018-03-22-RDA-IG/2018-03-22-RDA-IG.org index a381f09..3e13d49 100644 --- a/talks-public/2018-03-22-RDA-IG/2018-03-22-RDA-IG.org +++ b/talks-public/2018-03-22-RDA-IG/2018-03-22-RDA-IG.org @@ -1,263 +1,265 @@ #+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt) #+TITLE: Metadata, use cases, identifiers and reproducibility #+SUBTITLE: Metadata, identifiers and reproducibility # does not allow short title, so we override it for beamer as follows : # +BEAMER_HEADER: \title[Availability and traceability]{Preserving Software and Data} #+BEAMER_HEADER: \author[Roberto Di Cosmo, Morane Gruenpeter]{Roberto Di Cosmo (Software Heritage, INRIA)\\ Morane Gruenpeter (Software Heritage, CrossMiner)} # -*- org-image-actual-width: nil; -*- #+AUTHOR: Morane Gruenpeter (Software Heritage, Crossminer) #+DATE: Mars 22nd, 2018 #+EMAIL: morane@softwareheritage.org #+DESCRIPTION: Intrinsic identifiers for digital objects #+KEYWORDS: software heritage legacy preservation knowledge mankind technology # # # Prelude contains all the information needed to export the main beamer latex source # #+INCLUDE: "../../common/modules/prelude-toc.org" :minlevel 1 # #+INCLUDE: "../../common/modules/169.org" * Explore the metadata landscape ** The metadata challenge #+latex: \begin{center} \huge{What is software ?} \end{center} #+BEAMER: \pause *** Software as a concept :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_env: block :END: - software project / entity - - the community around it +#+BEAMER: \pause + - the creators and the community around it #+BEAMER: \pause *** Software artifact :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_env: block :END: - - each source code version - the binaries for different environments +#+BEAMER: \pause + - the *software source code* for each version + ** The metadata landscape #+latex: \begin{center} #+ATTR_LATEX: :width \linewidth # file:file:metadata_landscape_final.png file:metadata_landscape_RDA11_1.png #+latex: \end{center} ** The metadata landscape #+latex: \begin{center} #+ATTR_LATEX: :width \linewidth # file:file:metadata_landscape_final.png file:metadata_landscape_RDA11_2.png #+latex: \end{center} ** The metadata landscape #+latex: \begin{center} #+ATTR_LATEX: :width \linewidth # file:file:metadata_landscape_final.png file:metadata_landscape_RDA11_3.png #+latex: \end{center} ** The metadata landscape #+latex: \begin{center} #+ATTR_LATEX: :width \linewidth # file:file:metadata_landscape_final.png file:metadata_landscape_RDA11_4.png #+latex: \end{center} ** The metadata landscape #+latex: \begin{center} #+ATTR_LATEX: :width \linewidth # file:file:metadata_landscape_final.png file:metadata_landscape_RDA11_5.png #+latex: \end{center} ** The metadata landscape #+latex: \begin{center} #+ATTR_LATEX: :width \linewidth # file:file:metadata_landscape_final.png file:metadata_landscape_RDA11.png #+latex: \end{center} * 10th RDA plenary Software Source Code IG results ** 10th RDA plenary Software Source Code IG results *** Subjects discussed with questionnaire - interest in /Software Source Code/ - use cases - ontology/vocabularies used - properties needed for Software Source Code - advantages for structured data ** Interest in IG *** Research topics :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_env: block :END: - software accompany data - promote software as a first class research product - software citation - research software - improve publication #+BEAMER: \pause *** General topics :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_env: block :END: - PID for software - managing code : incorporate better practices for software - discover and recover software - reuse - preserving software source code ** Identified use cases *** Research use cases :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_env: block :END: - publish / deposit source code with metadata - credit attribution and authorship - reproducibility - what test data are available - research software source code #+BEAMER: \pause *** General use cases :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_env: block :END: - archive software - expose metadata to indexes - link to people, data, funding - discovery (semantic search) - conditions/restrictions for use - build software (what compiler is required) - - integrate to workflow + - integrate into workflow ** Metadata terms *** identify :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.2 :BEAMER_env: block :END: - identifier - title - authors - version - type - origin source #+BEAMER: \pause *** execute :B_block:BMCOL: :PROPERTIES: :BEAMER_opt: :BEAMER_env: block :BEAMER_col: 0.2 :END: - - link to compiled version + - link to a compiled version - repository - compiler - environment - examples #+BEAMER: \pause *** classify :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.2 :BEAMER_env: block :END: - description - keywords - in/out data - references - algorithms - docs url #+BEAMER: \pause *** administrate :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.2 :BEAMER_env: block :END: - contact - authorship - funders - license - publisher - dates - status #+BEAMER: \pause *** Review, comments and questions? * The quest for a Software PID ** Back to basics: DIOs vs. IDOs *** DIO (digital identifier of an object) :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_env: block :END: - digital identifiers for traditional (non digital) objects - epistemic complications and significant governance issues, ... #+BEAMER: \pause \hfill The \alert{software concept/project} needs a DIO #+BEAMER: \pause *** IDO (identifier of a digital object) :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_env: block :END: - (digital) identifier for digital objects - simpler to build/handle and can be intrinsic #+BEAMER: \pause \hfill The \alert{software source code} needs an IDO for each version or state #+BEAMER: \pause *** Separation of concerns - yes, we \alert{need both} DIOs and IDOs - no, we \alert{must not mistake} DIOs for IDOs (and viceversa) ** Our challenge in the PID arena *** Long term Identifiers must be there for the long term *** No middle man Identifiers must be meaningful even if resolvers go away *** Integrity, not just naming Identifier must ensure that the retrieved object is the intended one *** Uniqueness by design only one name for each object, each object has only one name - * Reproducibility, Replicability and Source code ** Reproducibility, Replicability and Source code ACM policies: [[https://www.acm.org/publications/policies/artifact-review-badging][Artifact Review and Badging]] *** Terminology :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_env: block :END: - *Repeatability* \\ same team, same experimental setup - *Replicability* \\ different team, same experimental setup - *Reproducibility* \\ different team, different experimental setup #+BEAMER: \pause *** Evaluation of software artefacts :PROPERTIES: :BEAMER_col: 0.4 :BEAMER_env: block :END: #+latex: \begin{center} #+ATTR_LATEX: :width 0.6\linewidth # file:file:metadata_landscape_final.png file:acm_badges.png #+latex: \end{center} #+BEAMER: \pause *** Pressure to make research code available is now raising access to /all/ the source code is not just necessary - to /reproduce/, it is also useful to /evolve and modify/, - to /build new experiments/ from the old ones + to /reproduce/, it is also useful to /modify/ and /evolve/, + so as to *build new experiments* from the old ones.