diff --git a/talks-public/2018-03-22-RDA-IG/2018-03-22-RDA-IG.org b/talks-public/2018-03-22-RDA-IG/2018-03-22-RDA-IG.org index 3e13d49..a23aeee 100644 --- a/talks-public/2018-03-22-RDA-IG/2018-03-22-RDA-IG.org +++ b/talks-public/2018-03-22-RDA-IG/2018-03-22-RDA-IG.org @@ -1,265 +1,290 @@ #+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt) #+TITLE: Metadata, use cases, identifiers and reproducibility #+SUBTITLE: Metadata, identifiers and reproducibility # does not allow short title, so we override it for beamer as follows : # +BEAMER_HEADER: \title[Availability and traceability]{Preserving Software and Data} #+BEAMER_HEADER: \author[Roberto Di Cosmo, Morane Gruenpeter]{Roberto Di Cosmo (Software Heritage, INRIA)\\ Morane Gruenpeter (Software Heritage, CrossMiner)} # -*- org-image-actual-width: nil; -*- #+AUTHOR: Morane Gruenpeter (Software Heritage, Crossminer) #+DATE: Mars 22nd, 2018 #+EMAIL: morane@softwareheritage.org #+DESCRIPTION: Intrinsic identifiers for digital objects #+KEYWORDS: software heritage legacy preservation knowledge mankind technology # # # Prelude contains all the information needed to export the main beamer latex source # #+INCLUDE: "../../common/modules/prelude-toc.org" :minlevel 1 # #+INCLUDE: "../../common/modules/169.org" * Explore the metadata landscape ** The metadata challenge #+latex: \begin{center} \huge{What is software ?} \end{center} #+BEAMER: \pause *** Software as a concept :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_env: block :END: - software project / entity #+BEAMER: \pause - the creators and the community around it #+BEAMER: \pause *** Software artifact :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_env: block :END: - the binaries for different environments #+BEAMER: \pause - the *software source code* for each version ** The metadata landscape #+latex: \begin{center} #+ATTR_LATEX: :width \linewidth # file:file:metadata_landscape_final.png file:metadata_landscape_RDA11_1.png #+latex: \end{center} ** The metadata landscape #+latex: \begin{center} #+ATTR_LATEX: :width \linewidth # file:file:metadata_landscape_final.png file:metadata_landscape_RDA11_2.png #+latex: \end{center} ** The metadata landscape #+latex: \begin{center} #+ATTR_LATEX: :width \linewidth # file:file:metadata_landscape_final.png file:metadata_landscape_RDA11_3.png #+latex: \end{center} ** The metadata landscape #+latex: \begin{center} #+ATTR_LATEX: :width \linewidth # file:file:metadata_landscape_final.png file:metadata_landscape_RDA11_4.png #+latex: \end{center} ** The metadata landscape #+latex: \begin{center} #+ATTR_LATEX: :width \linewidth # file:file:metadata_landscape_final.png file:metadata_landscape_RDA11_5.png #+latex: \end{center} ** The metadata landscape #+latex: \begin{center} #+ATTR_LATEX: :width \linewidth # file:file:metadata_landscape_final.png file:metadata_landscape_RDA11.png #+latex: \end{center} * 10th RDA plenary Software Source Code IG results ** 10th RDA plenary Software Source Code IG results *** Subjects discussed with questionnaire - interest in /Software Source Code/ - use cases - ontology/vocabularies used - properties needed for Software Source Code - advantages for structured data ** Interest in IG *** Research topics :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_env: block :END: - software accompany data - promote software as a first class research product - software citation - research software - improve publication #+BEAMER: \pause *** General topics :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_env: block :END: - PID for software - managing code : incorporate better practices for software - discover and recover software - reuse - preserving software source code ** Identified use cases *** Research use cases :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_env: block :END: - publish / deposit source code with metadata - credit attribution and authorship - reproducibility - what test data are available - research software source code #+BEAMER: \pause *** General use cases :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_env: block :END: - archive software - expose metadata to indexes - link to people, data, funding - discovery (semantic search) - conditions/restrictions for use - build software (what compiler is required) - integrate into workflow ** Metadata terms *** identify :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.2 :BEAMER_env: block :END: - identifier - title - authors - version - type - origin source #+BEAMER: \pause *** execute :B_block:BMCOL: :PROPERTIES: :BEAMER_opt: :BEAMER_env: block :BEAMER_col: 0.2 :END: - link to a compiled version - repository - compiler - environment - examples #+BEAMER: \pause *** classify :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.2 :BEAMER_env: block :END: - description - keywords - in/out data - references - algorithms - docs url #+BEAMER: \pause *** administrate :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.2 :BEAMER_env: block :END: - contact - authorship - funders - license - - publisher + - editor (publisher) - dates - status #+BEAMER: \pause -*** Review, comments and questions? +*** /Group activity/: review, comments and questions -* The quest for a Software PID +* Updates from, and links to Software Heritage +** The Software Heritage Project \hfill www.softwareheritage.org + :PROPERTIES: + :CUSTOM_ID: mission + :END: +#+latex: \begin{center} +#+ATTR_LATEX: :width .8\linewidth +# file:SWH-logo+motto.pdf +file:SWH-logo.pdf +#+latex: \end{center} +*** Our mission + *Collect*, *preserve* and *share* the /source code/ of /all the software/\\ +\mbox{}\\ + \hfill /Preserving/ the past, /enhancing/ the present, /preparing/ the future +*** Going global :noexport: + \hfill building an /open, multistakeholder, nonprofit/ organisation + +# +# Value proposition, getting involved +# +** Archive and observatory, serving the needs of society as a whole +#+latex: \begin{center} +#+ATTR_LATEX: :width .6\linewidth +file:SWH-as-foundation-slim.png +#+latex: \end{center} +#+BEGIN_EXPORT latex +\note{On top of Software Heritage one can imagine a myriad applications, for education, +research, industry, cultural heritage, and society as a whole.\\[1em] +But building the universal archive of source code geared towards the long term is a grand challenge +on its own: so we follow the Unix phylosophy, and focus on doing one thing, and doing it well, +building this essential infrastructure for software.\\[1em] +Our principles are simple: all our code is and will be open source, our organisation is transparent;\\[1em] +and we focus on the long term: we will grow an international network of mirrors and partners +and create a non profit foundation to coordinate it for the benefit of society as a whole.} +#+END_EXPORT +\begin{center} + \includegraphics[width=.7\linewidth]{growth.png} +\end{center} +*** + \hfill largest collection of software source code in the world + +** Our challenge in the PID arena +*** Our requirements + - Long term :: identifiers must be there for the long term + - Free :: one cannot /buy/ billions of identifiers + - No middle man :: identifiers must be meaningful even if resolvers go away + - Integrity, not just naming :: identifier must ensure that the retrieved object is the intended one + - Uniqueness by design :: only one name for each object, each object has only one name +#+BEAMER: \pause +*** We can find no satisfaction... + - Ark, PURLs, DOIs, Handle, ... all miss a part of it + - we use cryptographic hashes instead (Merkle trees, circa 1979) ** Back to basics: DIOs vs. IDOs *** DIO (digital identifier of an object) :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_env: block :END: - digital identifiers for traditional (non digital) objects - epistemic complications and significant governance issues, ... #+BEAMER: \pause \hfill The \alert{software concept/project} needs a DIO #+BEAMER: \pause *** IDO (identifier of a digital object) :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_env: block :END: - (digital) identifier for digital objects - simpler to build/handle and can be intrinsic #+BEAMER: \pause \hfill The \alert{software source code} needs an IDO for each version or state #+BEAMER: \pause *** Separation of concerns - yes, we \alert{need both} DIOs and IDOs - no, we \alert{must not mistake} DIOs for IDOs (and viceversa) -** Our challenge in the PID arena -*** Long term - Identifiers must be there for the long term -*** No middle man - Identifiers must be meaningful even if resolvers go away -*** Integrity, not just naming - Identifier must ensure that the retrieved object is the intended one -*** Uniqueness by design - only one name for each object, each object has only one name - -* Reproducibility, Replicability and Source code -** Reproducibility, Replicability and Source code -ACM policies: [[https://www.acm.org/publications/policies/artifact-review-badging][Artifact Review and Badging]] -*** Terminology - :PROPERTIES: - :BEAMER_col: 0.5 - :BEAMER_env: block - :END: - - *Repeatability* \\ same team, same experimental setup - - *Replicability* \\ different team, same experimental setup - - *Reproducibility* \\ different team, different experimental setup -#+BEAMER: \pause -*** Evaluation of software artefacts - :PROPERTIES: - :BEAMER_col: 0.4 - :BEAMER_env: block - :END: -#+latex: \begin{center} - #+ATTR_LATEX: :width 0.6\linewidth -# file:file:metadata_landscape_final.png -file:acm_badges.png -#+latex: \end{center} -#+BEAMER: \pause - -*** Pressure to make research code available is now raising - access to /all/ the source code is not just necessary - to /reproduce/, it is also useful to /modify/ and /evolve/, - so as to *build new experiments* from the old ones. - +* Conclusion +** Request for comment +*** Feedback on our PID schemas + see https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html +*** Test the archive navigation in Software Heritage + https://archive.softwareheritage.org/ (user: *rda* passwd: *2018*) +*** Thoughts on the DIO / IDO conceptualization + contact Roberto and Morane +** :B_ignoreheading: + :PROPERTIES: + :BEAMER_env: ignoreheading + :END: +#+latex: \vfill\centerline{\huge Questions?}\vfill