diff --git a/common/images/cloud_metadata_credit.png b/common/images/cloud_metadata_credit.png new file mode 100644 index 0000000..61aa409 Binary files /dev/null and b/common/images/cloud_metadata_credit.png differ diff --git a/common/images/metadata_landscape_final.png b/common/images/metadata_landscape_final.png new file mode 100644 index 0000000..bc99b20 Binary files /dev/null and b/common/images/metadata_landscape_final.png differ diff --git a/talks-public/2018-03-22-RDA-IG/2018-03-22-RDA-IG.org b/talks-public/2018-03-22-RDA-IG/2018-03-22-RDA-IG.org new file mode 100644 index 0000000..7a30672 --- /dev/null +++ b/talks-public/2018-03-22-RDA-IG/2018-03-22-RDA-IG.org @@ -0,0 +1,223 @@ +#+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt) +#+TITLE: Software Source Code Interest Group +#+SUBTITLE: Metadata, identifiers and reproducibility +# does not allow short title, so we override it for beamer as follows : +# +BEAMER_HEADER: \title[Availability and traceability]{Preserving Software and Data} +#+BEAMER_HEADER: \author[Roberto Di Cosmo, Morane Gruenpeter]{Roberto Di Cosmo (Software Heritage, INRIA)\\ Morane Gruenpeter (Software Heritage, CrossMiner)} + + +#+AUTHOR: Roberto Di Cosmo (Software Heritage, Inria), Morane Gruenpeter (Software Heritage, Crossminer) +#+DATE: Mars 22nd, 2018 +#+EMAIL: roberto@dicosmo.org +#+DESCRIPTION: Intrinsic identifiers for digital objects +#+KEYWORDS: software heritage legacy preservation knowledge mankind technology +# + +# +# Prelude contains all the information needed to export the main beamer latex source +# + +#+INCLUDE: "../../common/modules/prelude.org" :minlevel 1 +# +#+INCLUDE: "../../common/modules/169.org" + +** The Software Heritage Project \hfill www.softwareheritage.org + :PROPERTIES: + :CUSTOM_ID: mission + :END: +#+latex: \begin{center} +#+ATTR_LATEX: :width \linewidth +# file:SWH-logo+motto.pdf +file:SWH-logo.pdf +#+latex: \end{center} +*** Our mission + *Collect*, *preserve* and *share* the /source code/ of /all + the software/ that is publicly available +*** Past, present and future + \hfill /Preserving/ the past, /enhancing/ the present, /preparing/ the future \hfill +# Better society, better education, better science, better industry + +* 10th RDA plenary Software Source Code IG results +** 10th RDA plenary Software Source Code IG results +*** Identified + - interest in /Software Source Code/ + - use cases + - ontology/vocabularies used + - properties needed for Software Source Code + - advantages for structured data + +** Interest in IG +*** Author point of view :B_block:BMCOL: + :PROPERTIES: + :BEAMER_col: 0.5 + :BEAMER_env: block + :END: + - software accompany data + - software citation- get credit + - register and describe software + - promote software as a first class research product + - PID for software + - managing code : incorporate better practices for software + +*** User point of view :B_block:BMCOL: + :PROPERTIES: + :BEAMER_col: 0.5 + :BEAMER_env: block + :END: + - discover and recover software + - software citation- how to cite + - software discovery and research + - improve publication + - reuse + - preserving software source code + +** Identified use cases +*** Author point of view :B_block:BMCOL: + :PROPERTIES: + :BEAMER_col: 0.5 + :BEAMER_env: block + :END: + - publish / deposit source code with metadata + - archive software + - expose metadata to indexes + - credit attribution and authorship + - conditions/restrictions for use + - link to people, data, funding + +*** User point of view :B_block:BMCOL: + :PROPERTIES: + :BEAMER_col: 0.5 + :BEAMER_env: block + :END: + - discovery (semantic search) + - lookup software source code + - reproducibility + - what compiler is required + - what test data are available + - build software + - integrate to workflow + +** Metadata terms +*** identify :B_block:BMCOL: + :PROPERTIES: + :BEAMER_col: 0.2 + :BEAMER_env: block + :END: + - identifier + - title + - authors + - version + - type + - origin source +*** execute :B_block:BMCOL: + :PROPERTIES: + :BEAMER_opt: + :BEAMER_env: block + :BEAMER_col: 0.2 + :END: + - link to compiled version + - repository + - compiler + - environment + - examples +*** classify :B_block:BMCOL: + :PROPERTIES: + :BEAMER_col: 0.2 + :BEAMER_env: block + :END: + - description + - keywords + - in/out data + - references + - algorithms + - docs url + - status + +*** administrative :B_block:BMCOL: + :PROPERTIES: + :BEAMER_col: 0.2 + :BEAMER_env: block + :END: + - contact + - authorship + - funders + - license + - publisher + - dates + + +* Explore the metadata landscape +** With what terms should we describe a \emph{software artifact}? +#+latex: \begin{center} +#+ATTR_LATEX: :width \linewidth +# file:file:metadata_landscape_final.png +file:cloud_metadata_credit.png +#+latex: \end{center} + +** Explore the metadata landscape +#+latex: \begin{center} +#+ATTR_LATEX: :width \linewidth +# file:file:metadata_landscape_final.png +file:metadata_landscape_final.png +#+latex: \end{center} + +* The quest for a Software PID +** Back to basics: DIOs vs. IDOs +*** DIO (digital identifier of an object) + - digital identifiers for traditional (non digital) objects + - epistemic complications and significant governance issues, ... +#+BEAMER: \pause + \hfill The \alert{software concept/project} needs a DIO +#+BEAMER: \pause +*** IDO (identifier of a digital object) + - (digital) identifier for digital objects + - simpler to build/handle and can be intrinsic +#+BEAMER: \pause + \hfill The \alert{software source code} needs an IDO for each version or state +#+BEAMER: \pause +*** Separation of concerns + - yes, we \alert{need both} DIOs and IDOs + - no, we \alert{must not mistake} DIOs for IDOs (and viceversa) + +** Our challenge in the PID arena +*** Long term + Identifiers must be there for the long term +*** No middle man + Identifiers must be meaningful even if resolvers go away +*** Integrity, not just naming + Identifier must ensure that the retrieved object is the intended one +*** Uniqueness by design + only one name for each object, each object has only one name + +** Intrinsic identifiers in Software Heritage + # R. C. Merkle, A digital signature based on a conventional encryption + # function, Crypto '87 + #+BEAMER: \vspace{-3mm} +***** Merkle tree (R. C. Merkle, Crypto 1979) :B_picblock: + :PROPERTIES: + :BEAMER_opt: pic=merkle, leftpic=true, width=.5\linewidth + :BEAMER_env: picblock + :BEAMER_act: + :END: + Combination of + - tree + - hash function +***** Classical cryptographic construction + fast, parallel signature of large data structures, built-in deduplication +#+BEAMER: \pause + - satisfies all three criteria + - widely used in industry (e.g., Git, nix, blockchains, IPFS, ...) + +#+BEAMER: \pause +** Working together +*** Example: links to /software source code/ in an article + Leveraging the Software Heritage universal archive: + - set of files :: \small\url{swh:1:tree:06741c8c37c5a384083082b99f4c5ad94cd0cd1f}\\ + id of tree object listing all the files in a project (at a given time) + - revision :: \url{swh:1:rev:7598fb94d59178d65bd8d2892c19356290f5d4e3}\\ + id of commit object which a tree and (a pointer to) the history +#+BEAMER: \pause + - metadata :: this /will/ involve some form of DIO + - and we get all the complications back + +* Reproducibility diff --git a/talks-public/2018-03-22-RDA-IG/Makefile b/talks-public/2018-03-22-RDA-IG/Makefile new file mode 100644 index 0000000..68fbee7 --- /dev/null +++ b/talks-public/2018-03-22-RDA-IG/Makefile @@ -0,0 +1 @@ +include ../Makefile.slides