diff --git a/common/images/acm_badges.png b/common/images/acm_badges.png new file mode 100644 index 0000000..bc0e86f Binary files /dev/null and b/common/images/acm_badges.png differ diff --git a/common/images/metadata_landscape_RDA11.png b/common/images/metadata_landscape_RDA11.png new file mode 100644 index 0000000..61a726f Binary files /dev/null and b/common/images/metadata_landscape_RDA11.png differ diff --git a/common/images/metadata_landscape_RDA11_1.png b/common/images/metadata_landscape_RDA11_1.png new file mode 100644 index 0000000..987b7ae Binary files /dev/null and b/common/images/metadata_landscape_RDA11_1.png differ diff --git a/common/images/metadata_landscape_RDA11_2.png b/common/images/metadata_landscape_RDA11_2.png new file mode 100644 index 0000000..0e933b8 Binary files /dev/null and b/common/images/metadata_landscape_RDA11_2.png differ diff --git a/common/images/metadata_landscape_RDA11_3.png b/common/images/metadata_landscape_RDA11_3.png new file mode 100644 index 0000000..44335c7 Binary files /dev/null and b/common/images/metadata_landscape_RDA11_3.png differ diff --git a/common/images/metadata_landscape_RDA11_4.png b/common/images/metadata_landscape_RDA11_4.png new file mode 100644 index 0000000..f8acda7 Binary files /dev/null and b/common/images/metadata_landscape_RDA11_4.png differ diff --git a/common/images/metadata_landscape_RDA11_5.png b/common/images/metadata_landscape_RDA11_5.png new file mode 100644 index 0000000..cc625eb Binary files /dev/null and b/common/images/metadata_landscape_RDA11_5.png differ diff --git a/common/images/metadata_landscape_final.png b/common/images/metadata_landscape_final.png deleted file mode 100644 index bc99b20..0000000 Binary files a/common/images/metadata_landscape_final.png and /dev/null differ diff --git a/talks-public/2018-03-22-RDA-IG/2018-03-22-RDA-IG.org b/talks-public/2018-03-22-RDA-IG/2018-03-22-RDA-IG.org index 7a30672..a381f09 100644 --- a/talks-public/2018-03-22-RDA-IG/2018-03-22-RDA-IG.org +++ b/talks-public/2018-03-22-RDA-IG/2018-03-22-RDA-IG.org @@ -1,223 +1,263 @@ #+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt) -#+TITLE: Software Source Code Interest Group +#+TITLE: Metadata, use cases, identifiers and reproducibility #+SUBTITLE: Metadata, identifiers and reproducibility # does not allow short title, so we override it for beamer as follows : # +BEAMER_HEADER: \title[Availability and traceability]{Preserving Software and Data} #+BEAMER_HEADER: \author[Roberto Di Cosmo, Morane Gruenpeter]{Roberto Di Cosmo (Software Heritage, INRIA)\\ Morane Gruenpeter (Software Heritage, CrossMiner)} +# -*- org-image-actual-width: nil; -*- - -#+AUTHOR: Roberto Di Cosmo (Software Heritage, Inria), Morane Gruenpeter (Software Heritage, Crossminer) +#+AUTHOR: Morane Gruenpeter (Software Heritage, Crossminer) #+DATE: Mars 22nd, 2018 -#+EMAIL: roberto@dicosmo.org +#+EMAIL: morane@softwareheritage.org #+DESCRIPTION: Intrinsic identifiers for digital objects #+KEYWORDS: software heritage legacy preservation knowledge mankind technology # # # Prelude contains all the information needed to export the main beamer latex source # -#+INCLUDE: "../../common/modules/prelude.org" :minlevel 1 +#+INCLUDE: "../../common/modules/prelude-toc.org" :minlevel 1 # #+INCLUDE: "../../common/modules/169.org" -** The Software Heritage Project \hfill www.softwareheritage.org - :PROPERTIES: - :CUSTOM_ID: mission - :END: +* Explore the metadata landscape +** The metadata challenge +#+latex: \begin{center} \huge{What is software ?} \end{center} +#+BEAMER: \pause +*** Software as a concept :B_block:BMCOL: + :PROPERTIES: + :BEAMER_col: 0.5 + :BEAMER_env: block + :END: + - software project / entity + - the community around it +#+BEAMER: \pause + +*** Software artifact :B_block:BMCOL: + :PROPERTIES: + :BEAMER_col: 0.5 + :BEAMER_env: block + :END: + - each source code version + - the binaries for different environments + +** The metadata landscape #+latex: \begin{center} -#+ATTR_LATEX: :width \linewidth -# file:SWH-logo+motto.pdf -file:SWH-logo.pdf + #+ATTR_LATEX: :width \linewidth +# file:file:metadata_landscape_final.png +file:metadata_landscape_RDA11_1.png #+latex: \end{center} -*** Our mission - *Collect*, *preserve* and *share* the /source code/ of /all - the software/ that is publicly available -*** Past, present and future - \hfill /Preserving/ the past, /enhancing/ the present, /preparing/ the future \hfill -# Better society, better education, better science, better industry +** The metadata landscape +#+latex: \begin{center} + #+ATTR_LATEX: :width \linewidth +# file:file:metadata_landscape_final.png +file:metadata_landscape_RDA11_2.png +#+latex: \end{center} + +** The metadata landscape +#+latex: \begin{center} + #+ATTR_LATEX: :width \linewidth +# file:file:metadata_landscape_final.png +file:metadata_landscape_RDA11_3.png +#+latex: \end{center} + +** The metadata landscape +#+latex: \begin{center} + #+ATTR_LATEX: :width \linewidth +# file:file:metadata_landscape_final.png +file:metadata_landscape_RDA11_4.png +#+latex: \end{center} +** The metadata landscape +#+latex: \begin{center} + #+ATTR_LATEX: :width \linewidth +# file:file:metadata_landscape_final.png +file:metadata_landscape_RDA11_5.png +#+latex: \end{center} + + +** The metadata landscape +#+latex: \begin{center} + #+ATTR_LATEX: :width \linewidth +# file:file:metadata_landscape_final.png +file:metadata_landscape_RDA11.png +#+latex: \end{center} * 10th RDA plenary Software Source Code IG results ** 10th RDA plenary Software Source Code IG results -*** Identified +*** Subjects discussed with questionnaire - interest in /Software Source Code/ - use cases - ontology/vocabularies used - properties needed for Software Source Code - advantages for structured data ** Interest in IG -*** Author point of view :B_block:BMCOL: +*** Research topics :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_env: block :END: - software accompany data - - software citation- get credit - - register and describe software - promote software as a first class research product - - PID for software - - managing code : incorporate better practices for software + - software citation + - research software + - improve publication +#+BEAMER: \pause -*** User point of view :B_block:BMCOL: +*** General topics :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_env: block :END: + - PID for software + - managing code : incorporate better practices for software - discover and recover software - - software citation- how to cite - - software discovery and research - - improve publication - reuse - preserving software source code ** Identified use cases -*** Author point of view :B_block:BMCOL: +*** Research use cases :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_env: block :END: - publish / deposit source code with metadata - - archive software - - expose metadata to indexes - credit attribution and authorship - - conditions/restrictions for use - - link to people, data, funding + - reproducibility + - what test data are available + - research software source code -*** User point of view :B_block:BMCOL: +#+BEAMER: \pause +*** General use cases :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_env: block :END: + - archive software + - expose metadata to indexes + - link to people, data, funding - discovery (semantic search) - - lookup software source code - - reproducibility - - what compiler is required - - what test data are available - - build software + - conditions/restrictions for use + - build software (what compiler is required) - integrate to workflow ** Metadata terms *** identify :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.2 :BEAMER_env: block :END: - identifier - title - authors - version - type - origin source +#+BEAMER: \pause *** execute :B_block:BMCOL: :PROPERTIES: :BEAMER_opt: :BEAMER_env: block :BEAMER_col: 0.2 :END: - link to compiled version - repository - compiler - environment - - examples + - examples +#+BEAMER: \pause *** classify :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.2 :BEAMER_env: block :END: - description - keywords - in/out data - references - algorithms - docs url - - status +#+BEAMER: \pause -*** administrative :B_block:BMCOL: +*** administrate :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.2 :BEAMER_env: block :END: - contact - authorship - funders - license - publisher - dates - - -* Explore the metadata landscape -** With what terms should we describe a \emph{software artifact}? -#+latex: \begin{center} -#+ATTR_LATEX: :width \linewidth -# file:file:metadata_landscape_final.png -file:cloud_metadata_credit.png -#+latex: \end{center} - -** Explore the metadata landscape -#+latex: \begin{center} -#+ATTR_LATEX: :width \linewidth -# file:file:metadata_landscape_final.png -file:metadata_landscape_final.png -#+latex: \end{center} + - status +#+BEAMER: \pause +*** Review, comments and questions? * The quest for a Software PID ** Back to basics: DIOs vs. IDOs -*** DIO (digital identifier of an object) +*** DIO (digital identifier of an object) :B_block:BMCOL: + :PROPERTIES: + :BEAMER_col: 0.5 + :BEAMER_env: block + :END: - digital identifiers for traditional (non digital) objects - epistemic complications and significant governance issues, ... #+BEAMER: \pause \hfill The \alert{software concept/project} needs a DIO #+BEAMER: \pause -*** IDO (identifier of a digital object) +*** IDO (identifier of a digital object) :B_block:BMCOL: + :PROPERTIES: + :BEAMER_col: 0.5 + :BEAMER_env: block + :END: - (digital) identifier for digital objects - simpler to build/handle and can be intrinsic #+BEAMER: \pause \hfill The \alert{software source code} needs an IDO for each version or state #+BEAMER: \pause *** Separation of concerns - yes, we \alert{need both} DIOs and IDOs - no, we \alert{must not mistake} DIOs for IDOs (and viceversa) ** Our challenge in the PID arena *** Long term Identifiers must be there for the long term *** No middle man Identifiers must be meaningful even if resolvers go away *** Integrity, not just naming Identifier must ensure that the retrieved object is the intended one *** Uniqueness by design only one name for each object, each object has only one name -** Intrinsic identifiers in Software Heritage - # R. C. Merkle, A digital signature based on a conventional encryption - # function, Crypto '87 - #+BEAMER: \vspace{-3mm} -***** Merkle tree (R. C. Merkle, Crypto 1979) :B_picblock: - :PROPERTIES: - :BEAMER_opt: pic=merkle, leftpic=true, width=.5\linewidth - :BEAMER_env: picblock - :BEAMER_act: - :END: - Combination of - - tree - - hash function -***** Classical cryptographic construction - fast, parallel signature of large data structures, built-in deduplication -#+BEAMER: \pause - - satisfies all three criteria - - widely used in industry (e.g., Git, nix, blockchains, IPFS, ...) +* Reproducibility, Replicability and Source code +** Reproducibility, Replicability and Source code +ACM policies: [[https://www.acm.org/publications/policies/artifact-review-badging][Artifact Review and Badging]] +*** Terminology + :PROPERTIES: + :BEAMER_col: 0.5 + :BEAMER_env: block + :END: + - *Repeatability* \\ same team, same experimental setup + - *Replicability* \\ different team, same experimental setup + - *Reproducibility* \\ different team, different experimental setup #+BEAMER: \pause -** Working together -*** Example: links to /software source code/ in an article - Leveraging the Software Heritage universal archive: - - set of files :: \small\url{swh:1:tree:06741c8c37c5a384083082b99f4c5ad94cd0cd1f}\\ - id of tree object listing all the files in a project (at a given time) - - revision :: \url{swh:1:rev:7598fb94d59178d65bd8d2892c19356290f5d4e3}\\ - id of commit object which a tree and (a pointer to) the history +*** Evaluation of software artefacts + :PROPERTIES: + :BEAMER_col: 0.4 + :BEAMER_env: block + :END: +#+latex: \begin{center} + #+ATTR_LATEX: :width 0.6\linewidth +# file:file:metadata_landscape_final.png +file:acm_badges.png +#+latex: \end{center} #+BEAMER: \pause - - metadata :: this /will/ involve some form of DIO - - and we get all the complications back -* Reproducibility +*** Pressure to make research code available is now raising + access to /all/ the source code is not just necessary + to /reproduce/, it is also useful to /evolve and modify/, + to /build new experiments/ from the old ones +