diff --git a/common/images/DIOvsIDO.png b/common/images/DIOvsIDO.png new file mode 100644 index 0000000..bb455b2 Binary files /dev/null and b/common/images/DIOvsIDO.png differ diff --git a/talks-public/2020-01-29-Pidapalooza/2020-01-29-Pidapalooza.org b/talks-public/2020-01-29-Pidapalooza/2020-01-29-Pidapalooza.org new file mode 100644 index 0000000..e1d31be --- /dev/null +++ b/talks-public/2020-01-29-Pidapalooza/2020-01-29-Pidapalooza.org @@ -0,0 +1,188 @@ +#+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt) +#+TITLE: The swh-id: a digital fingerprint identifying software source code +#+SUBTITLE: +#+AUTHOR: Roberto Di Cosmo +#+EMAIL: roberto@dicosmo.org @rdicosmo @swheritage +#+BEAMER_HEADER: \date{January 29th, 2020} +#+BEAMER_HEADER: \title[The swh-id]{The swh-id: a digital fingerprint identifying software source code} +#+BEAMER_HEADER: \author[{\bf Roberto Di Cosmo}, Morane Gruenpeter]{{\bf Roberto Di Cosmo}, Morane Guenpeter\\[1em]% +#+BEAMER_HEADER: Director, Software Heritage\\Computer Science full professor, Inria and IRIF\\[-1em]} +# #+BEAMER_HEADER: \setbeameroption{show notes on second screen} +#+BEAMER_HEADER: \setbeameroption{hide notes} +#+KEYWORDS: software heritage legacy preservation knowledge mankind technology +#+LATEX_HEADER: \usepackage{tcolorbox} +#+LATEX_HEADER: \definecolor{links}{HTML}{2A1B81} +#+LATEX_HEADER: \hypersetup{colorlinks,linkcolor=,urlcolor=links} +# +# prelude.org contains all the information needed to export the main beamer latex source +# use prelude-toc.org to get the table of contents +# + +#+INCLUDE: "../../common/modules/prelude-toc.org" :minlevel 1 + + +#+INCLUDE: "../../common/modules/169.org" + +# +LaTeX_CLASS_OPTIONS: [aspectratio=169,handout,xcolor=table] + +#+LATEX_HEADER: \usepackage{bbding} +#+LATEX_HEADER: \DeclareUnicodeCharacter{66D}{\FiveStar} + +# +# If you want to change the title logo it's here +# +# +BEAMER_HEADER: \titlegraphic{\includegraphics[width=0.7\textwidth]{SWH-logo}} + +# aspect ratio can be changed, but the slides need to be adapted +# - compute a "resizing factor" for the images (macro for picblocks?) +# +# set the background image +# +# https://pacoup.com/2011/06/12/list-of-true-169-resolutions/ +# +#+BEAMER_HEADER: \pgfdeclareimage[height=90mm,width=160mm]{bgd}{swh-world-169.png} +#+BEAMER_HEADER: \setbeamertemplate{background}{\pgfuseimage{bgd}} +#+LATEX: \addtocounter{framenumber}{-1} + +* The Software Heritage initiative +#+INCLUDE: "../../common/modules/swh-goals-oneslide-vertical.org::#goals" :minlevel 2 +** A principled infrastructure \hfill \url{http://bit.ly/swhpaper} + #+latex: \begin{center} + #+ATTR_LATEX: :width 0.5\linewidth + file:SWH-as-foundation-slim.png + #+latex: \end{center} + #+BEAMER: \pause + #+latex: \centering + #+ATTR_LATEX: :width \extblockscale{.7\linewidth} + file:growth.png + #+BEAMER: \pause +*** Technology + :PROPERTIES: + :BEAMER_col: 0.34 + :BEAMER_env: block + :END: + - transparency and FOSS + - replicas all the way down +*** Content (billions!) + :PROPERTIES: + :BEAMER_col: 0.32 + :BEAMER_env: block + :END: + - *intrinsic identifiers* + - facts and provenance +*** Organization + :PROPERTIES: + :BEAMER_col: 0.33 + :BEAMER_env: block + :END: + - non-profit + - multi-stakeholder + +* The Knowledge is in the Source Code +** The knowledge is in the source code! +#+INCLUDE: "../../common/modules/source-code-different-short.org::#thesourcecode" :only-contents t :minlevel 3 +** Source code is /special/ +*** /Executable/ and /human readable/ knowledge \hfill copyright law + /“Programs must be written for people to read, and only incidentally for machines to execute.”/\\ + \hfill Harold Abelson +#+BEAMER: \pause +*** Software /evolves/ over time + - projects may last decades + - the /development history/ is key to its /understanding/ +#+BEAMER: \pause +*** Complexity :B_picblock: + :PROPERTIES: + :BEAMER_env: picblock + :BEAMER_OPT: pic=python3-matplotlib.pdf, width=.6\linewidth + :END: + - /millions/ of lines of code + - large /web of dependencies/ + + easy to break, difficult to maintain + - sophisticated /developer communities/ + +# ** How we built our scientific knowledge +# reproducibility and scientific knowledge pillars (one slide) +#+INCLUDE: "../../common/modules/swh-scientific-reproducibility.org::#main" :only-contents t :minlevel 2 +# + + +* Challenges +** Much more complex than it seems +*** Software is complex + - Structure :: monolithic/composite; self-contained/external dependencies + - Lifetime :: one-shot/long term + - Community :: one man/one team/distributed community + - Authorship :: complex set of roles + - Authority :: institutions/organizations/communities/single person +#+BEAMER: \pause +*** Various granularities + - Exact status of the source code :: for reproducibility, e.g. +#+latex: \emph{``you can find at \href{https://archive.softwareheritage.org/swh:1:cnt:cdf19c4487c43c76f3612557d4dc61f9131790a4;lines=146-187/}{swh:1:cnt:cdf19c4487c43c76f3612557d4dc61f9131790a4;lines=146-187} the core algorithm used in this article''} + + - (Major) release :: \emph{``This functionality is available in OCaml version 4''} + + - Project :: \emph{``Inria has created OCaml and Scikit-Learn''}. +** We are not alone +*** Research Software does not exist in isolation :B_picblock: + :PROPERTIES: + :BEAMER_env: picblock + :BEAMER_OPT: pic=python3-matplotlib.pdf, width=.6\linewidth, leftpic=true + :END: + large /web of dependencies/ on non-research software +#+BEAMER: \pause +*** Industry and developers have been here :B_block: + :PROPERTIES: + :BEAMER_env: block + :BEAMER_COL: .5 + :END: + - NSRL (NIST) + - SPDX (Linux Foundation) + - SWH-ID (Software Heritage) + - SWID (ISO Standard) + - Wikidata Software Properties +#+BEAMER: \pause +*** We must :B_block: + :PROPERTIES: + :BEAMER_env: block + :BEAMER_COL: .5 + :END: + - accept the complexity + - avoid reinventing the wheel + - connect with existing communities of practice + +* Extrinsic vs Intrinsic identifiers +** An important distinction: DIOs vs. IDOs + :PROPERTIES: + :CUSTOM_ID: diovsido + :END: +#+BEGIN_EXPORT latex + \begin{quote} + The term “Digital Object Identifier” is construed as “digital identifier of an object," rather than “identifier of a digital object” \hfill Norman Paskin. 2010 + \end{quote} +#+END_EXPORT +#+BEAMER: \pause +*** DIO (Digital Identifier of an Object) + digital identifiers for (potentially) *non digital objects* + - epistemic complexity (manifestations, versions, locations, etc.) + - need an authority to ensure persistence and uniqueness +#+BEAMER: \pause +*** IDO (Identifier of a Digital Object) + digital identifiers (only) for *digital objects* + - can provide both *integrity* and *no middle man* + - broadly used in modern software development (git, etc.) +** An important distinction: DIOs vs. IDOs + #+latex: \begin{center} + #+ATTR_LATEX: :width 0.859\linewidth + file:DIOvsIDO.png + #+latex: \end{center} +#+BEAMER: \pause + \hfill for the core Software Heritage archive, *IDOs are enough* + +** Intrinsic: what does it really mean? +Examples of intrinsic identifiers (DNA, music notes, etc.) + +* The SWH-ID: the source code fingerprint +** the origins +** an overview of the archive data model +** parmap showcase +** swh-identify: how to find a digital object's intrinsic identifier diff --git a/talks-public/2020-01-29-Pidapalooza/METADATA b/talks-public/2020-01-29-Pidapalooza/METADATA new file mode 100644 index 0000000..600ef0f --- /dev/null +++ b/talks-public/2020-01-29-Pidapalooza/METADATA @@ -0,0 +1,33 @@ +Title: The swh-id: a digital fingerprint identifying software source code + + + Abstract: + + The Software Heritage universal archive of software source code relies on + well established techniques used in software development communities to + identify the over 20 billion code artefacts it preserves + cryptographic hashes in a Merkle DAG data structure. + + In this session we will first explain the motivations of this choice, + recalling Paskin's essential distinction between digital identifiers of + an object (DIOs) and identifiers of digital objects (IDOs). + + Then we will focus on the properties of the Software Heritage Identifiers + (SWH-IDs) that matter most in a reproducibility and long term archival framework: + intrinsic integrity and independent verifiability. + + Finally, we will show practically how they can be used to improve current + research publication practices. + + How would you run the session to support the spirit of PIDapalooza as a laid-back, + welcoming, energetic and exciting meeting, and ensure at least 10 minutes of + your session are used to interact with the audience? + + We will do a live demonstration of the swh-identify module that can extract + the PID from the digital artefact. + Also we will show how to resolve an swh-id on the online archive and how + to find a swh-id of a preserved artefact. + + Finally, we will invite participants that want to preserve their repositories + or important repositories to submit the code with Software Heritage's + "save code now" feature. diff --git a/talks-public/2020-01-29-Pidapalooza/Makefile b/talks-public/2020-01-29-Pidapalooza/Makefile new file mode 100644 index 0000000..68fbee7 --- /dev/null +++ b/talks-public/2020-01-29-Pidapalooza/Makefile @@ -0,0 +1 @@ +include ../Makefile.slides