diff --git a/common/modules/swh-pids.org b/common/modules/swh-pids.org index a01abd8..86be5a6 100644 --- a/common/modules/swh-pids.org +++ b/common/modules/swh-pids.org @@ -1,147 +1,156 @@ #+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt) # # Software Heritage PIDs: here we present our rationale for introducing a new identifier schema, and the identifier schema itself # #+INCLUDE: "prelude.org" :minlevel 1 # # We need tcolorbox here: add the following lines to your main .org document! # #+LATEX_HEADER: \usepackage{tcolorbox} #+BEAMER_HEADER: \usepackage{tcolorbox} * The quest for a PID :PROPERTIES: :CUSTOM_ID: main :END: ** Systems of identifiers :PROPERTIES: :CUSTOM_ID: definition :END: *** A /system of identifiers/ is - a set of labels (the identifiers) - mechanisms to perform : |------------------------+---------------------------| | /Generation (minting)/ | create a new label | | /Assignment/ | associate label to object | | /Retrieval/ | get object from a label | |------------------------+---------------------------| - optionally, mechanisms to perform: |------------------+---------------------------| | /Verification/ | check label and object | | /Reverse Lookup/ | get label from an object | | /Description/ | get metadata of an object | |------------------+---------------------------| ** Mechanisms offered in some systems of identifiers :PROPERTIES: :CUSTOM_ID: survey :END: |--------------------+----------+-------+-------+--------| | *Mech.* / *System* | *Handle* | *DOI* | *Ark* | *PURL* | |--------------------+----------+-------+-------+--------| | Generation | Yes | Yes | Yes | Yes | | Assignment | Yes | Yes | Yes | Yes | | Retrieval | Yes | Yes | Yes | Yes | | Verification | N.A. | N.A. | N.A. | N.A. | | Reverse Lookup | N.A. | N.A. | N.A. | N.A. | | Description | Yes | Yes | Yes | N.A. | |--------------------+----------+-------+-------+--------| ** Our challenges in the PID landscape :PROPERTIES: :CUSTOM_ID: challenges :END: *** Typical properties of systems of identifiers \hfill uniqueness, non ambiguity, persistence, abstraction (opacity) #+BEAMER: \pause *** Key needed properties from our use cases - gratis :: identifiers are free (billions of objects) - integrity :: the associated object cannot be changed (sw dev, /reproducibility/) - no middle man :: no central authority is needed (sw dev, /reproducibility/) #+BEAMER: \pause *** \hfill we could not find systems with both *integrity* and *no middle man* ! ** An important distinction: DIOs vs. IDOs :PROPERTIES: :CUSTOM_ID: diovsido :END: #+BEGIN_EXPORT latex \begin{quote} The term “Digital Object Identifier” is construed as “digital identifier of an object," rather than “identifier of a digital object” \hfill Norman Paskin. 2010 \end{quote} #+END_EXPORT #+BEAMER: \pause *** DIO (Digital Identifier of an Object) digital identifiers for (potentially) *non digital objects* - epistemic complexity (manifestations, versions, locations, etc.) - need an authority to ensure persistence and uniqueness #+BEAMER: \pause *** IDO (Identifier of a Digital Object) digital identifiers (only) for *digital objects* - can provide both *integrity* and *no middle man* - broadly used in modern software development (git, etc.) #+BEAMER: \pause *** \hfill for the core Software Heritage archive, *IDOs are enough* ** IDOs in Software Development: the origins + :PROPERTIES: + :CUSTOM_ID: swhorigins + :END: # R. C. Merkle, A digital signature based on a conventional encryption # function, Crypto '87 #+BEAMER: \vspace{-3mm} ***** Merkle tree (R. C. Merkle, Crypto 1979) :B_picblock: :PROPERTIES: :BEAMER_opt: pic=merkle, leftpic=true, width=.5\linewidth :BEAMER_env: picblock :BEAMER_act: :END: Combination of - tree - hash function ***** Classical cryptographic construction fast, parallel signature of large data structures, built-in deduplication #+BEAMER: \pause - satisfies all three criteria: *gratis, integrity, no middle man*! - widely used in industry (e.g., Git, nix, blockchains, IPFS, ...) ** IDOs in Software Heritage: a worked example + :PROPERTIES: + :CUSTOM_ID: swhworked + :END: #+LATEX: \centering\forcebeamerstart #+LATEX: \only<1>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_1}}} #+LATEX: \only<2>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/contents}}} #+LATEX: \only<3>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_2_contents}}} #+LATEX: \only<4>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/directories}}} #+LATEX: \only<5>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_3_directories}}} #+LATEX: \only<6>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/revisions}}} #+LATEX: \only<7>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_4_revisions}}} #+LATEX: \only<8>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/releases}}} #+LATEX: \only<9>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_5_releases}}} #+LATEX: \only<10>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/snapshots.pdf}}} #+LATEX: \forcebeamerend ** The Software Heritage IDO schema \hfill (see *\url{http://bit.ly/swhpids}*) + :PROPERTIES: + :CUSTOM_ID: swhexamples + :END: #+BEGIN_EXPORT latex \small \begin{tcolorbox} \href{https://archive.softwareheritage.org/swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2} {swh:1:{\bf cnt}:94a9ed024d3859793618152ea559a168bbcbb5e2} \hfill full text of the GPL3 license \end{tcolorbox} \pause \begin{tcolorbox} \href{https://archive.softwareheritage.org/swh:1:dir:d198bc9d7a6bcf6db04f476d29314f157507d505} {swh:1:{\bf dir}:d198bc9d7a6bcf6db04f476d29314f157507d505} \hfill Darktable source code \end{tcolorbox} \pause \begin{tcolorbox} \href{https://archive.softwareheritage.org/swh:1:rev:309cf2674ee7a0749978cf8265ab91a60aea0f7d} {swh:1:{\bf rev}:309cf2674ee7a0749978cf8265ab91a60aea0f7d} \end{tcolorbox} \hfill a {\bf revision} in the development history of Darktable\\\pause \begin{tcolorbox} \href{https://archive.softwareheritage.org/swh:1:rel:22ece559cc7cc2364edc5e5593d63ae8bd229f9f} {swh:1:{\bf rel}:22ece559cc7cc2364edc5e5593d63ae8bd229f9f} \end{tcolorbox} \hfill {\bf release} 2.3.0 of Darktable, dated 24 December 2016\\\pause \begin{tcolorbox} \href{https://archive.softwareheritage.org/swh:1:snp:c7c108084bc0bf3d81436bf980b46e98bd338453} {swh:1:{\bf snp}:c7c108084bc0bf3d81436bf980b46e98bd338453} \end{tcolorbox} \hfill a {\bf snapshot} of the entire Darktable repository (4 May 2017, GitHub) #+END_EXPORT #+LATEX: \pause *** *Current resolvers:* \url{archive.softwareheritage.org} and \url{n2t.org} diff --git a/talks-public/2019-04-03-RDA-WG/2019-04-03_RDA-WG.org b/talks-public/2019-04-03-RDA-WG/2019-04-03_RDA-WG.org index c6d9383..037d99f 100644 --- a/talks-public/2019-04-03-RDA-WG/2019-04-03_RDA-WG.org +++ b/talks-public/2019-04-03-RDA-WG/2019-04-03_RDA-WG.org @@ -1,334 +1,357 @@ #+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt) #+TITLE: Software Source Code Identification #+SUBTITLE: Working Group #+AUTHOR: Roberto Di Cosmo #+EMAIL: roberto@dicosmo.org @rdicosmo @swheritage #+BEAMER_HEADER: \date{April 25nd, 2019} -#+BEAMER_HEADER: \title[www.softwareheritage.org]{Identifiers for Digital Objects} -#+BEAMER_HEADER: \author[Roberto Di Cosmo \hspace{5em} www.dicosmo.org]{{\bf Roberto Di Cosmo}, Daniel Katz, Martin Fenner} +#+BEAMER_HEADER: \title[Source Code Identification]{Software Source Code Identification} +#+BEAMER_HEADER: \author[Roberto Di Cosmo \hspace{5em} www.dicosmo.org]{Roberto Di Cosmo} # #+BEAMER_HEADER: \setbeameroption{show notes on second screen} #+BEAMER_HEADER: \setbeameroption{hide notes} #+KEYWORDS: software heritage legacy preservation knowledge mankind technology #+LATEX_HEADER: \usepackage{tcolorbox} # # prelude.org contains all the information needed to export the main beamer latex source # use prelude-toc.org to get the table of contents # #+INCLUDE: "../../common/modules/prelude-toc.org" :minlevel 1 #+INCLUDE: "../../common/modules/169.org" # +LaTeX_CLASS_OPTIONS: [aspectratio=169,handout,xcolor=table] #+LATEX_HEADER: \usepackage{bbding} #+LATEX_HEADER: \DeclareUnicodeCharacter{66D}{\FiveStar} # # If you want to change the title logo it's here # # +BEAMER_HEADER: \titlegraphic{\includegraphics[width=0.7\textwidth]{SWH-logo}} # aspect ratio can be changed, but the slides need to be adapted # - compute a "resizing factor" for the images (macro for picblocks?) # # set the background image # # https://pacoup.com/2011/06/12/list-of-true-169-resolutions/ # #+BEAMER_HEADER: \pgfdeclareimage[height=90mm,width=160mm]{bgd}{swh-world-169.png} #+BEAMER_HEADER: \setbeamertemplate{background}{\pgfuseimage{bgd}} + +* Agenda :noexport: + - Introduction and motivation (15m, done) + - Discussion on motivations and difficulties (20m) + - Conceptual framework for source code identification (15m)- DIO & IDO + - A few proven approaches for software source code identification (30m): +# - swh-ids second part of the presentation +# - ascl ids + - Wrap up: summary of results and next steps (10m) * Introduction ** Working group key facts *** Joint RDA & FORCE11 WG which spawned from RDA's Software Source Code IG & FORCE11's SCIWG *** Co-chairs - Roberto Di Cosmo - Daniel Katz - Martin Fenner *** Objectives - bring together people involved/interested in software identification - produce concrete recommendations for the academic community -*** - Online document: http://bit.ly/rda13scidwg please register there +*** Please register + \hfill online document: http://bit.ly/rda13scidwg #+INCLUDE: "../../common/modules/rdc-bio.org::#main" :only-contents t :minlevel 2 * Setting the stage ** Software is Knowledge :PROPERTIES: :CUSTOM_ID: softwareknowledge :END: *** Software is /an essential component/ of modern scientific research :B_picblock: :PROPERTIES: :BEAMER_opt: pic=papermountain,width=.25\linewidth :BEAMER_env: picblock :BEAMER_act: +- :END: Top 100 papers (Nature, October 2014)\\ #+BEGIN_QUOTE [...] the vast majority describe experimental methods or sofware that have become essential in their fields.\\ #+END_QUOTE http://www.nature.com/news/the-top-100-papers-1.16224 -** The source code is essential! +** The source code is essential! :noexport: :PROPERTIES: :CUSTOM_ID: thesourcecode :END: - #+LATEX: \includegraphics[width=.10\linewidth]{software.png} + #+LATEX: \includegraphics[width=.08\linewidth]{software.png} #+BEGIN_QUOTE “The source code for a work means the preferred form of the work for making modifications to it." \hfill GPL Licence #+END_QUOTE #+Beamer: \pause -*** +*** :PROPERTIES: :BEAMER_env: block :BEAMER_act: +- :END: #+latex: \begin{center} Hello World \end{center} *** Program (excerpt of binary) :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_env: block :BEAMER_act: +- :END: #+begin_src hex :exports code 4004e6: 55 4004e7: 48 89 e5 4004ea: bf 84 05 40 00 4004ef: b8 00 00 00 00 4004f4: e8 c7 fe ff ff 4004f9: 90 4004fa: 5d 4004fb: c3 #+end_src *** Program (source code) :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.55 :BEAMER_env: block :BEAMER_act: +- :END: #+begin_src c :exports code /* Hello World program */ #include void main() { printf("Hello World"); } #+end_src ** Software Source Code is /special/ :PROPERTIES: :CUSTOM_ID: softwareisdifferent :END: *** Harold Abelson, Structure and Interpretation of Computer Programs /“Programs must be written for people to read, and only incidentally for machines to execute.”/ -*** Quake 2 source code (excerpt) :B_block:BMCOL: +*** Quake III Arena source code (excerpt) :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.45 :BEAMER_env: block :END: #+LATEX: \includegraphics[width=\linewidth]{quake-carmack-sqrt-1.png} # smart efficient implementation of 1/sqrt(x) on a CPU without special support *** Net. queue in Linux (excerpt) :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.45 :BEAMER_env: block :END: #+LATEX: \includegraphics[width=\linewidth]{juliusz-sfb-short.png} # Juliusz implementation of stochastic fair blue in the Linux Kernel linux/net/sched/sch_sfb.c *** :B_ignoreheading: :PROPERTIES: :BEAMER_env: ignoreheading :END: *** Len Shustek, Computer History Museum \hfill /“Source code provides a view into the mind of the designer.”/ +** An example from my research field, Computer Science +*** + \hfill /Repeatability in computer systems research/, Christian Collberg, 2016 +#+INCLUDE: "../../common/modules/reprod-bad-sota.org::#collbergmethod" :only-contents t :minlevel 3 +** ... cont'd +#+INCLUDE: "../../common/modules/reprod-bad-sota.org::#collbergfindings" :only-contents t :minlevel 3 +*** The main reasons + \hfill source code (/or the right version of it/) cannot be found +** Software Source code: important pillar of Open Science +*** The Magic Triangle of Scientific Knowledge +#+latex: \begin{center} +#+ATTR_LATEX: :width \extblockscale{.7\linewidth} +file:PreservationTriangle.png +#+latex: \end{center} +#+BEAMER: \pause +*** Nota bene + \hfill The links in the picture are *essential* ** Forgotten pillar of (Open) Science *** Lack of recognition :PROPERTIES: :BEAMER_env: block :BEAMER_COL: .5 :END: not (yet) a first class citizen - in the EOSC plan - in the EU copyright reform - in the scholarly works #+BEAMER: \pause *** Lack of guidance/consensus on how to :PROPERTIES: :BEAMER_env: block :BEAMER_COL: .5 :END: - choose a license - cite a software project - relate to industry best practices - make source code FAIR(*) #+BEAMER: \pause *** :B_ignoreheading: :PROPERTIES: :BEAMER_env: ignoreheading :END: *** Lack of basic prerequisites to reproducibility See a discussion in \url{http://annex.softwareheritage.org/public/talks/2018/2018-09-17-STScI_public.pdf} ** Interest in (research) software is raising *** A wealth of activities in academia - artifact evaluation :: \mbox{}\\ now commonplace in CS conferences - reproducible research :: \mbox{}\\ hot area of interest (jury still out on how to really do this) - software archival :: \mbox{}\\ publishers, open access portals, propose their services - academic credit :: \mbox{}\\ research software authors want recognition #+BEAMER: \pause *** Identifiers \hfill for all the above, proper *identifiers* are needed ** Challenges for academia *** Accept the complexity: software is /special/ - made by humans for humans: copyright law applies! - not (just) data: we may have a nice hammer, but software is not a nail #+BEAMER: \pause *** Industry, developers, communities have been there we must - avoid reinventing the wheel - connect with existing communities of practice -#+BEAMER: \pause -*** Let's start from ... - \hfill identifiers -* Identifying software source code ** Fragmented landscape *** Academic initiatives :B_block: :PROPERTIES: :BEAMER_env: block :BEAMER_COL: .6 :END: - - Force 11 Software Citation Principles WG + - Force 11 Software Citation Implementation WG - Freya EU project - OpenAire EU project - - Publisher offerings + - [[https://www.ouvrirlascience.fr/wp-content/uploads/2018/11/2018.09.19-About-the-proposal-for-software-indicators-in-OSM.pdf][EU Open Science Monitor]] #+BEAMER: \pause *** Industry initiatives :B_block: :PROPERTIES: :BEAMER_env: block :BEAMER_COL: .4 :END: - NSRL (NIST) - SPDX (Linux Foundation) - SWID (ISO Standard) - ... #+BEAMER: \pause *** :B_ignoreheading: :PROPERTIES: :BEAMER_env: ignoreheading :END: *** Transversal initiatives - - Software Heritage \hfill (disclosure: I'm leading it) - + Software Heritage identifiers (SWH-ID)\hfill (disclosure: I'm leading it) +* Identifying software source code: motivations and difficulties +** Discussion +*** Software source code identification + - motivations + - difficulties +*** + The floor is yours ** Different motivations\hfill here are a few -*** Give credit - - citations :: that count for software authors - #+BEAMER: \pause *** Support reproducible research and reuse - references :: to retrieve the exact version of a software artefact used in a research #+BEAMER: \pause +*** Give credit + - citations :: that count for software authors + #+BEAMER: \pause *** Transparency - software bill of materials :: enable traceability of software artefacts ** It is way more complex than it seems *** All software projects are not born equal **** :B_column: :PROPERTIES: :BEAMER_env: column :BEAMER_COL: .45 :END: - structure :: \mbox{} + monolithic + composite - lifetime :: \mbox{} + one shot + long running - community :: \mbox{} + single developer + large community **** :B_column: :PROPERTIES: :BEAMER_env: column :BEAMER_COL: .45 :END: - authorship :: \mbox{} + plurality of roles + difficulty of evaluating contributions - authority :: \mbox{} + just the commit log + top down + institution #+BEAMER: \pause -*** Bottomline +*** And /attribution/ adds to the complexity /software citation/ is much more than ... \hfill /software identification/! -* VALID UP TO HERE: WHAT FOLLOWS NEED TO BE REWORKED -* Agenda - - Introduction and motivation (15m, done) - - Group work and discussion on the objectives of source code identification (20m) - - Conceptual framework for source code identification (15m, done) - - Presentation of an initial state-of-the-art in the area of software source code identification (15m) - - Group work on a document describing the state-of-the-art (20m) - - Wrap up: summary of results and next steps (10m) - -* The Software Heritage initiative -#+INCLUDE: "../../common/modules/swh-goals-oneslide-vertical.org::#goals" :minlevel 2 -** A principled infrastructure \hfill \url{http://bit.ly/swhpaper} - #+latex: \begin{center} - #+ATTR_LATEX: :width 0.5\linewidth - file:SWH-as-foundation-slim.png - #+latex: \end{center} - #+BEAMER: \pause - #+latex: \centering - #+ATTR_LATEX: :width \extblockscale{.7\linewidth} - file:growth.png - #+BEAMER: \pause -*** Technology - :PROPERTIES: - :BEAMER_col: 0.34 - :BEAMER_env: block - :END: - - transparency and FOSS - - replicas all the way down -*** Content (billions!) - :PROPERTIES: - :BEAMER_col: 0.32 - :BEAMER_env: block - :END: - - *intrinsic identifiers* - - facts and provenance -*** Organization - :PROPERTIES: - :BEAMER_col: 0.33 - :BEAMER_env: block - :END: - - non-profit - - multi-stakeholder -* Looking for the right PIDs - #+INCLUDE: "../../common/modules/swh-pids.org::#main" :only-contents t -* Demo time -** A "wayback machine" for software source code -*** Identifiers in action - - *\url{http://archive.softwareheritage.org/browse}* -* Conclusion -** Conclusion \hfill @swheritage - #+BEAMER: \vspace{-1mm} -*** - - there are many systems of identifiers - - DIOs and IDOs cater to different needs - - IDOs enable *integrity* and *no middle man* properties *together* - - Software Heritage is using IDOs for billions of objects, *today* - - we believe IDOs are appropriate for most *digital born* content that has a *canonical* representation - #+BEAMER: \vspace{-1mm} -*** Come in, we're open! - \url{www.softwareheritage.org} --- learn more \\ - \url{www.softwareheritage.org/support/sponsors/} --- sponsoring info \\ - \url{www.softwareheritage.org/support/partners} --- partners \\ - \url{forge.softwareheritage.org} --- our own code - #+BEAMER: \vspace{-1mm} \flushright {\Huge Questions?} \vfill +* Conceptual framework for identifiers + #+INCLUDE: "../../common/modules/swh-pids.org::#definition" :minlevel 2 + #+INCLUDE: "../../common/modules/swh-pids.org::#survey" :minlevel 2 + #+INCLUDE: "../../common/modules/swh-pids.org::#challenges" :minlevel 2 + # #+INCLUDE: "../../common/modules/swh-pids.org::#diovsido" :minlevel 2 +** An important distinction: DIOs vs. IDOs + :PROPERTIES: + :CUSTOM_ID: diovsido + :END: +#+BEGIN_EXPORT latex + \begin{quote} + The term “Digital Object Identifier” is construed as “digital identifier of an object," rather than “identifier of a digital object” \hfill Norman Paskin. 2010 + \end{quote} +#+END_EXPORT +#+BEAMER: \pause +*** DIO (Digital Identifier of an Object) + digital identifiers for (potentially) *non digital objects* + - epistemic complexity (manifestations, versions, locations, etc.) + - need an *authority* to ensure persistence and uniqueness +#+BEAMER: \pause +*** IDO (Identifier of a Digital Object) + digital identifiers (only) for *digital objects* + - can provide both *integrity* and *no middle man* + - broadly used in modern software development (git, etc.) +#+BEAMER: \pause +*** + *IDOs are enough* for reproducibility \hfill *DIOs* are needed for attribution +* IDOs : SWH-IDs + #+INCLUDE: "../../common/modules/swh-pids.org::#swhorigins" :minlevel 2 + #+INCLUDE: "../../common/modules/swh-pids.org::#swhworked" :minlevel 2 + #+INCLUDE: "../../common/modules/swh-pids.org::#swhexamples" :minlevel 2 +* DIOs and curation process: ASCL +* DIOs and curation via publications +*** The swMath project + - see \url{https://swmath.org} + - nice example: \url{http://swmath.org/software/7116} + - source code is archived in Software Heritage +* DIOs and IDOs: the HAL / SWH use case +** Deposit Scientific Software + #+INCLUDE: "../../common/modules/swh-scientific-publishing.org::#hal" :minlevel 3 :only-contents t +* Discussion and wrap up +** Roadmap +*** Duration: 18 months + - collect state of the art + - extract minimum viable recommendations + - propose actionable plans +** Building a state of the art +*** Curation process + talk to + - tech transfer departments + - promotion committees + - department reports +*** Identifiers + - thematic communities + - +# https://www.gnu.org/software/guix/blog/2019/connecting-reproducible-deployment-to-a-long-term-source-code-archive/ +# add swmath,org +# issues with automation (see Zenodo?) +#