diff --git a/talks-public/2018-03-22-RDA-IG/2018-03-22-RDA-IG.org b/talks-public/2018-03-22-RDA-IG/2018-03-22-RDA-IG.org index a23aeee..d1e3e4e 100644 --- a/talks-public/2018-03-22-RDA-IG/2018-03-22-RDA-IG.org +++ b/talks-public/2018-03-22-RDA-IG/2018-03-22-RDA-IG.org @@ -1,290 +1,287 @@ #+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt) -#+TITLE: Metadata, use cases, identifiers and reproducibility -#+SUBTITLE: Metadata, identifiers and reproducibility +#+TITLE: Metadata, use cases and identifiers +#+SUBTITLE: Metadata, use cases and identifiers # does not allow short title, so we override it for beamer as follows : # +BEAMER_HEADER: \title[Availability and traceability]{Preserving Software and Data} -#+BEAMER_HEADER: \author[Roberto Di Cosmo, Morane Gruenpeter]{Roberto Di Cosmo (Software Heritage, INRIA)\\ Morane Gruenpeter (Software Heritage, CrossMiner)} -# -*- org-image-actual-width: nil; -*- +#+BEAMER_HEADER: \author[Roberto Di Cosmo, Morane Gruenpeter]{Roberto Di Cosmo (SWH, INRIA)\\ Morane Gruenpeter (SWH, CrossMiner)} -#+AUTHOR: Morane Gruenpeter (Software Heritage, Crossminer) + +#+AUTHOR:Roberto Di Cosmo (SWH, INRIA)\\ Morane Gruenpeter (SWH, Crossminer) #+DATE: Mars 22nd, 2018 #+EMAIL: morane@softwareheritage.org #+DESCRIPTION: Intrinsic identifiers for digital objects #+KEYWORDS: software heritage legacy preservation knowledge mankind technology # # # Prelude contains all the information needed to export the main beamer latex source # #+INCLUDE: "../../common/modules/prelude-toc.org" :minlevel 1 # #+INCLUDE: "../../common/modules/169.org" * Explore the metadata landscape ** The metadata challenge #+latex: \begin{center} \huge{What is software ?} \end{center} #+BEAMER: \pause *** Software as a concept :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_env: block :END: - software project / entity #+BEAMER: \pause - the creators and the community around it #+BEAMER: \pause *** Software artifact :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_env: block :END: - the binaries for different environments #+BEAMER: \pause - the *software source code* for each version ** The metadata landscape #+latex: \begin{center} #+ATTR_LATEX: :width \linewidth # file:file:metadata_landscape_final.png file:metadata_landscape_RDA11_1.png #+latex: \end{center} ** The metadata landscape #+latex: \begin{center} #+ATTR_LATEX: :width \linewidth # file:file:metadata_landscape_final.png file:metadata_landscape_RDA11_2.png #+latex: \end{center} ** The metadata landscape #+latex: \begin{center} #+ATTR_LATEX: :width \linewidth # file:file:metadata_landscape_final.png file:metadata_landscape_RDA11_3.png #+latex: \end{center} ** The metadata landscape #+latex: \begin{center} #+ATTR_LATEX: :width \linewidth # file:file:metadata_landscape_final.png file:metadata_landscape_RDA11_4.png #+latex: \end{center} ** The metadata landscape #+latex: \begin{center} #+ATTR_LATEX: :width \linewidth # file:file:metadata_landscape_final.png file:metadata_landscape_RDA11_5.png #+latex: \end{center} ** The metadata landscape #+latex: \begin{center} #+ATTR_LATEX: :width \linewidth # file:file:metadata_landscape_final.png file:metadata_landscape_RDA11.png #+latex: \end{center} * 10th RDA plenary Software Source Code IG results ** 10th RDA plenary Software Source Code IG results *** Subjects discussed with questionnaire - interest in /Software Source Code/ - use cases - ontology/vocabularies used - properties needed for Software Source Code - advantages for structured data ** Interest in IG *** Research topics :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_env: block :END: - software accompany data - promote software as a first class research product - software citation - research software - improve publication #+BEAMER: \pause *** General topics :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_env: block :END: - PID for software - managing code : incorporate better practices for software - discover and recover software - reuse - preserving software source code ** Identified use cases *** Research use cases :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_env: block :END: - publish / deposit source code with metadata - credit attribution and authorship - reproducibility - what test data are available - research software source code #+BEAMER: \pause *** General use cases :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_env: block :END: - archive software - expose metadata to indexes - link to people, data, funding - discovery (semantic search) - conditions/restrictions for use - build software (what compiler is required) - integrate into workflow ** Metadata terms *** identify :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.2 :BEAMER_env: block :END: - identifier - title - authors - version - type - origin source #+BEAMER: \pause *** execute :B_block:BMCOL: :PROPERTIES: :BEAMER_opt: :BEAMER_env: block :BEAMER_col: 0.2 :END: - link to a compiled version - repository - compiler - environment - examples #+BEAMER: \pause *** classify :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.2 :BEAMER_env: block :END: - description - keywords - in/out data - references - algorithms - docs url #+BEAMER: \pause *** administrate :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.2 :BEAMER_env: block :END: - contact - authorship - funders - license - editor (publisher) - dates - status #+BEAMER: \pause *** /Group activity/: review, comments and questions * Updates from, and links to Software Heritage ** The Software Heritage Project \hfill www.softwareheritage.org :PROPERTIES: :CUSTOM_ID: mission :END: #+latex: \begin{center} #+ATTR_LATEX: :width .8\linewidth # file:SWH-logo+motto.pdf file:SWH-logo.pdf #+latex: \end{center} *** Our mission *Collect*, *preserve* and *share* the /source code/ of /all the software/\\ \mbox{}\\ \hfill /Preserving/ the past, /enhancing/ the present, /preparing/ the future *** Going global :noexport: \hfill building an /open, multistakeholder, nonprofit/ organisation -# -# Value proposition, getting involved -# + ** Archive and observatory, serving the needs of society as a whole #+latex: \begin{center} #+ATTR_LATEX: :width .6\linewidth file:SWH-as-foundation-slim.png #+latex: \end{center} #+BEGIN_EXPORT latex \note{On top of Software Heritage one can imagine a myriad applications, for education, research, industry, cultural heritage, and society as a whole.\\[1em] But building the universal archive of source code geared towards the long term is a grand challenge on its own: so we follow the Unix phylosophy, and focus on doing one thing, and doing it well, building this essential infrastructure for software.\\[1em] Our principles are simple: all our code is and will be open source, our organisation is transparent;\\[1em] and we focus on the long term: we will grow an international network of mirrors and partners and create a non profit foundation to coordinate it for the benefit of society as a whole.} #+END_EXPORT + \begin{center} \includegraphics[width=.7\linewidth]{growth.png} \end{center} *** \hfill largest collection of software source code in the world ** Our challenge in the PID arena *** Our requirements - Long term :: identifiers must be there for the long term - Free :: one cannot /buy/ billions of identifiers - No middle man :: identifiers must be meaningful even if resolvers go away - Integrity, not just naming :: identifier must ensure that the retrieved object is the intended one - Uniqueness by design :: only one name for each object, each object has only one name #+BEAMER: \pause *** We can find no satisfaction... - Ark, PURLs, DOIs, Handle, ... all miss a part of it - we use cryptographic hashes instead (Merkle trees, circa 1979) ** Back to basics: DIOs vs. IDOs *** DIO (digital identifier of an object) :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_env: block :END: - digital identifiers for traditional (non digital) objects - epistemic complications and significant governance issues, ... #+BEAMER: \pause \hfill The \alert{software concept/project} needs a DIO #+BEAMER: \pause *** IDO (identifier of a digital object) :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_env: block :END: - (digital) identifier for digital objects - simpler to build/handle and can be intrinsic #+BEAMER: \pause \hfill The \alert{software source code} needs an IDO for each version or state #+BEAMER: \pause *** Separation of concerns - yes, we \alert{need both} DIOs and IDOs - no, we \alert{must not mistake} DIOs for IDOs (and viceversa) * Conclusion ** Request for comment *** Feedback on our PID schemas see https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html *** Test the archive navigation in Software Heritage https://archive.softwareheritage.org/ (user: *rda* passwd: *2018*) *** Thoughts on the DIO / IDO conceptualization contact Roberto and Morane -** :B_ignoreheading: - :PROPERTIES: - :BEAMER_env: ignoreheading - :END: -#+latex: \vfill\centerline{\huge Questions?}\vfill +** + +#+latex: \begin{center} \huge{Questions ?} \end{center}