diff --git a/talks-public/2021-04-21-RDA-Data-Granularity/2021-04-21-RDA-Data-Granularity.org b/talks-public/2021-04-21-RDA-Data-Granularity/2021-04-21-RDA-Data-Granularity.org index 71ef5c0..2768e90 100644 --- a/talks-public/2021-04-21-RDA-Data-Granularity/2021-04-21-RDA-Data-Granularity.org +++ b/talks-public/2021-04-21-RDA-Data-Granularity/2021-04-21-RDA-Data-Granularity.org @@ -1,300 +1,185 @@ #+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt) -#+TITLE: Software Granularity -#+SUBTITLE:Intrinsic vs. extrinsic identifiers for different granularity levels +#+TITLE: On Software Granularity +#+SUBTITLE: Intrinsic and extrinsic identifiers to the rescue #+AUTHOR: Roberto Di Cosmo #+EMAIL: roberto@dicosmo.org @rdicosmo @swheritage #+BEAMER_HEADER: \date[April 21st, 2021]{April 21st, 2021\\[-1em]} -#+BEAMER_HEADER: \title[Software Granularity]{Software Granularity} -#+BEAMER_HEADER: \author[Roberto Di Cosmo]{Roberto Di Cosmo\\[1em]} +#+BEAMER_HEADER: \title[Software Granularity~~~~~ CC-BY 4.0]{Software Granularity} +#+BEAMER_HEADER: \author[Roberto Di Cosmo~~~~~@rdicosmo]{Roberto Di Cosmo\\[1em]} #+KEYWORDS: software heritage legacy preservation knowledge mankind technology SWHID granularity #+LATEX_HEADER: \usepackage{tcolorbox} #+LATEX_HEADER: \definecolor{links}{HTML}{2A1B81} #+LATEX_HEADER: \hypersetup{colorlinks,linkcolor=,urlcolor=links} # # prelude.org contains all the information needed to export the main beamer latex source # use prelude-toc.org to get the table of contents # #+INCLUDE: "../../common/modules/prelude.org" :minlevel 1 #+INCLUDE: "../../common/modules/169.org" # +LaTeX_CLASS_OPTIONS: [aspectratio=169,handout,xcolor=table] #+LATEX_HEADER: \usepackage{bbding} #+LATEX_HEADER: \usepackage{tcolorbox} #+LATEX_HEADER: \DeclareUnicodeCharacter{66D}{\FiveStar} # # If you want to change the title logo it's here # # +BEAMER_HEADER: \titlegraphic{\includegraphics[width=0.7\textwidth]{SWH-logo}} # aspect ratio can be changed, but the slides need to be adapted # - compute a "resizing factor" for the images (macro for picblocks?) # # set the background image # # https://pacoup.com/2011/06/12/list-of-true-169-resolutions/ # #+BEAMER_HEADER: \pgfdeclareimage[height=90mm,width=160mm]{bgd}{swh-world-169.png} #+BEAMER_HEADER: \setbeamertemplate{background}{\pgfuseimage{bgd}} #+LATEX: \addtocounter{framenumber}{-1} - - - +* Introduction + #+INCLUDE: "../../common/modules/rdc-bio.org::#main" :only-contents t :minlevel 2 +*** + \hfill In RDA: /Software Source code IG (co-chair), Source Code Identification WG (co-chair)/ * Software Source code: A pillar of Open Science ** Software Source code: pillar of Open Science *** Three pillars of Open Science :B_block: :PROPERTIES: :BEAMER_env: block :BEAMER_COL: .4 :END: #+latex: \begin{center} #+ATTR_LATEX: :width \extblockscale{1.4\linewidth} file:preservation_triangle_color.png #+latex: \end{center} #+BEAMER: \pause *** A plurality of needs :B_block: :PROPERTIES: :BEAMER_env: block :BEAMER_COL: .6 :END: - Researcher :: - *archive* and *reference* software used in articles - *find* useful software - get *credit* for developed software - verify/reproduce/improve results #+BEAMER: \pause - Laboratory/team :: track software contributions - produce reports / web page #+BEAMER: \pause - Research Organization :: know its *software assets* - technology *transfer* - impact *metrics* -** A principled infrastructure \hfill \url{http://bit.ly/swhpaper} :noexport: - #+latex: \begin{center} - #+ATTR_LATEX: :width 0.5\linewidth - file:SWH-as-foundation-slim.png - #+latex: \end{center} - #+BEAMER: \pause - #+latex: \centering - #+ATTR_LATEX: :width \extblockscale{.7\linewidth} - file:growth.png - #+BEAMER: \pause -*** Technology - :PROPERTIES: - :BEAMER_col: 0.34 - :BEAMER_env: block - :END: - - transparency and FOSS - - replicas all the way down -*** Content (billions!) - :PROPERTIES: - :BEAMER_col: 0.32 - :BEAMER_env: block - :END: - - *intrinsic identifiers* - - facts and provenance -*** Organization - :PROPERTIES: - :BEAMER_col: 0.33 - :BEAMER_env: block - :END: - - non-profit - - multi-stakeholder - -** Software is not /just/ data -*** Software has multiple facets in research +** Software is not (just) data +*** Software has multiple facets in research - a *tool* - a *research outcome* or result - the object of *study* #+BEAMER: \pause *** Source code is /special/ :PROPERTIES: :BEAMER_env: picblock :BEAMER_OPT: pic=python3-matplotlib.pdf, width=.51\linewidth :END: Software **evolves** over time - projects may last decades - the /development history/ is key to its /understanding/ #+BEAMER: \pause Layers of **complexity** - /millions/ of lines of code - large /web of dependencies/ - sophisticated /developer communities/ ** What is software ? - -*** Software as a concept :B_block:BMCOL: +*** Software as a concept :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_env: block :END: - software project / entity - the creators and the community around the project - the software solution / functionality #+BEAMER: \pause -*** Software artifact :B_block:BMCOL: +*** Software artifact :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_env: block :END: - the binaries for different environments - the *software source code* for each version - the multiple files or code fragments #+BEAMER: \pause *** Versioning, granularity - Project :: “Inria created OCaml and Scikit-learn”\pause - Release :: “2D Voronoi Diagrams were introduced in CGAL 3.1.0”\pause - Precise state of a project :: “This result was produced using commit 0064fbd...”\pause - Code fragment :: “The core algorithm is in lines 101 to 143 of the file parmap.ml contained in the precise state of the project corresponding to commit 0064fbd....” -* Intrinsic identifiers vs.extrinsic identifiers -** What do we want to identify? \hfill [[http://doi.org/10.15497/RDA00053][10.15497/RDA00053]] +** Granularity and identifiers \hfill [[http://doi.org/10.15497/RDA00053][10.15497/RDA00053]] #+LATEX: \centering\forcebeamerstart #+LATEX: \only<1>{\includegraphics[width=0.8\linewidth]{Granularity-Level-animated-0.png}} #+LATEX: \only<2>{\includegraphics[width=0.8\linewidth]{Granularity-Level-animated-1.png}} #+LATEX: \only<3>{\includegraphics[width=0.8\linewidth]{Granularity-Level-animated-2.png}} #+LATEX: \only<4>{\includegraphics[width=0.8\linewidth]{Granularity-Level-animated-3.png}} #+LATEX: \forcebeamerend - - -** Evolution of software development -*** Version control system (VCS) - - records changes made to a (set of) /source code file/ (s) - - allows to operate on versions: diff/merge/fork/recover etc. - - *essential* tool for software development - #+BEAMER: \pause -*** Three decades of evolution -#+LATEX: \centering -#+LATEX: \includegraphics[width=.8\linewidth]{VCS_history_timeline.png} - -** In a picture \hfill (from https://github.com/progit/progit2) :noexport: - #+BEGIN_EXPORT latex - \centering\forcebeamerstart - \only<1>{\colorbox{white}{\includegraphics[width=\extblockscale{.5\linewidth}]{localvcs}}\mbox{}\\[2em] - \texttt{co -r1.2 file.c} - } - \only<2>{\colorbox{white}{\includegraphics[width=\extblockscale{.5\linewidth}]{centralisedvcs}}\mbox{}\\[2em] - \texttt{cvs co -r Rel-1A ProgABC} - } - \only<3>{\colorbox{white}{\includegraphics[width=\extblockscale{.5\linewidth}]{distvcs}}\mbox{}\\[2em] - \texttt{git checkout df3b1b08f756569eff0919e37d8af1f403515b31} - } - \forcebeamerend - #+END_EXPORT -** Foundations of modern DVCS -**** Requirements for the D in DVCS - - *intrinsic* unique identifiers... \hfill (here: /cryptographic signature/, aka "hash") - - ... that work for *tree structures* (software directories) - #+BEAMER: \pause - # R. C. Merkle, A digital signature based on a conventional encryption - # function, Crypto '87 -**** Merkle tree to the rescue (R. C. Merkle, Crypto 1979) :B_picblock: - :PROPERTIES: - :BEAMER_opt: pic=merkle, leftpic=true, width=.7\linewidth - :BEAMER_env: picblock - :BEAMER_act: - :END: - Combination of - - tree - - hash function - -* The SWH-ID: the source code fingerprint -** The SWH-ID schema - # TODO: drawing with swh:1:cnt:xxxxxxx "exploded" and explained - #+LATEX: \centering\forcebeamerstart - #+LATEX: \only<1>{\includegraphics[width=\linewidth]{SWHID-v1.4_1.png}} - #+LATEX: \only<2>{\includegraphics[width=\linewidth]{SWHID-v1.4_2.png}} - #+LATEX: \only<3>{\includegraphics[width=\linewidth]{SWHID-v1.4_3.png}} - #+LATEX: \forcebeamerend -** A worked example - #+LATEX: \centering\forcebeamerstart - #+LATEX: \only<1>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_1.pdf}}} - #+LATEX: \only<2>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/contents.pdf}}} - #+LATEX: \only<3>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_2_contents.pdf}}} - #+LATEX: \only<4>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/directories.pdf}}} - #+LATEX: \only<5>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_3_directories.pdf}}} - #+LATEX: \only<6>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/revisions.pdf}}} - #+LATEX: \only<7>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_4_revisions.pdf}}} - #+LATEX: \only<8>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/releases.pdf}}} - #+LATEX: \only<9>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_5_releases.pdf}}} - #+LATEX: \only<10>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/snapshots.pdf}}} - #+LATEX: \forcebeamerend -** Demo time + #+LATEX: \only<1>{\begin{block}{}\centering Top concept layers vs. bottom artifact layers\end{block}} + #+LATEX: \only<2>{\begin{block}{}\centering Extrinsic identifiers are key for the concept layers\end{block}} + #+LATEX: \only<3>{\begin{block}{}\centering Intrinsic identifiers are key for the artifact layers\end{block}} + #+LATEX: \only<4>{\begin{block}{}\centering In some cases, extrinsic identifiers can be added too\end{block}} +** Extrinsic and Intrinsic identifiers in a nutshell +*** Extrinsic identifiers: no /per se/ relation with the designated Object + A /register/ keeps the correspondence between the identifier and the object + - pre-internet era :: passport number, social security number, ISBN, ISSN, etc. + - internet era :: DOI, Handle, Ark, PURLs, RRID, etc.\pause +*** Intrinsic identifiers: derived from the designated Object + /No register/ needed to keep the correspondence between the identifier and the object + - pre-internet era :: musical notation, chemical notation (/NaCl/ is table salt)\pause + - internet era :: cryptographic hashes for distributed software development, Bitcoin\pause *** - Let's look at some famous exceprts of source code -#+BEAMER: \pause -*** Apollo 11 source code ([[https://archive.softwareheritage.org/swh:1:cnt:64582b78792cd6c2d67d35da5a11bb80886a6409;origin=https://github.com/virtualagc/virtualagc;lines=245-261/][excerpt]]) :B_block:BMCOL: - :PROPERTIES: - :BEAMER_col: 0.48 - :BEAMER_env: block - :END: - #+LATEX: \includegraphics[width=\linewidth]{apollo-11-cranksilly.png} - # excerpt of routine that asks astronaut to turn around the LEM -#+BEAMER: \pause -*** Quake III source code ([[https://archive.softwareheritage.org/swh:1:cnt:bb0faf6919fc60636b2696f32ec9b3c2adb247fe;origin=https://github.com/id-Software/Quake-III-Arena;lines=549-572/][excerpt]]) :B_block:BMCOL: - :PROPERTIES: - :BEAMER_col: 0.45 - :BEAMER_env: block - :END: - #+LATEX: \includegraphics[width=\linewidth]{quake-carmack-sqrt-1.png} - # smart efficient implementation of 1/sqrt(x) on a CPU without special support -#+BEAMER: \pause -*** :B_ignoreheading: - :PROPERTIES: - :BEAMER_env: ignoreheading - :END: -*** It works! - we have /intrinsic/ identifiers for all 20+ billion objects in the archive - - -* Software is our heritage -#+INCLUDE: "../../common/modules/swh-goals-oneslide-vertical.org::#goals" :minlevel 2 -* Conclusion -** Food for thought -*** Intrinsic identifiers... - - can be extracted from the *object itself*, hence: - - no need for a /central authority/, nor maintenance - - any modification to the object changes the identifier - - identifies the /object/, not the /metadata/ ! -#+BEAMER: \pause -*** ... /for source code/ - - Distributed Version Control Systems made them popular - - massively used every day by millions of software developers - - Software Heritage provides *SWH-IDs* for billions of software artifacts - + \hfill more in [[https://www.softwareheritage.org/2020/07/09/intrinsic-vs-extrinsic-identifiers/][this dedicated blog post]] (with pointers to literature) +** Meet the Software Heritage Identifiers (SWHIDs) \hfill [[https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html][(full spec)]] + #+INCLUDE: "../../common/modules/swhid.org::#oneslide" :only-contents t + for *20+ billions* software artifacts! ** Come in, we're open! -*** - \url{www.softwareheritage.org} --- learn more \\ - \url{save.softwareheritage.org} --- save code now \\ - \url{www.softwareheritage.org/swhap} --- legacy software acquisition process \\ - \url{forge.softwareheritage.org} --- our own code + \vspace{-0.4em} +*** Thank you! + #+BEAMER: \vspace{-1mm} \flushright {\Huge Questions?} \vfill - -*** References :B_block: +*** :B_ignoreheading: :PROPERTIES: - :BEAMER_env: block + :BEAMER_env: ignoreheading :END: - #+BEGIN_EXPORT latex - \begin{thebibliography}{Foo Bar, 1969} - \footnotesize - \bibitem{Abramatic2018} Jean-François Abramatic, Roberto Di Cosmo, Stefano Zacchiroli\newblock - \emph{Building the Universal Archive of Source Code},\\ - Communications of the ACM, October 2018 - \href{https://doi.org/10.1145/3183558}{(10.1145/3183558)} - \bibitem{DiCosmo2019} Roberto Di Cosmo, Morane Gruenpeter, Stefano Zacchiroli\newblock - \emph{Referencing Source Code Artifacts: a Separate Concern in Software Citation},\\ - Computing in Science and Engineering, IEEE, pp.1-9. \href{https://dx.doi.org/10.1109/MCSE.2019.2963148}{(10.1109/MCSE.2019.2963148)} - \href{https://hal.archives-ouvertes.fr/hal-02446202}{(hal-02446202)} - \end{thebibliography} - #+END_EXPORT - - - + #+BEGIN_EXPORT latex + \begin{thebibliography}{Foo Bar, 1969} + \scriptsize + + \bibitem{EOSCSirs2020} EOSC SIRS Task Force + \newblock Scholarly Infrastructures for Research Software + \newblock 2020, European Commission, https://doi.org/10.2777/28598 + + \bibitem{DiCosmo2020d} Roberto Di Cosmo + \newblock Archiving and Referencing Source Code with Software Heritage + \newblock International Congress on Mathematical Software (ICMS), 2020 + + \bibitem{DiCosmo2020c} P. Alliez, R. Di Cosmo, B. Guedj, A. Girault, M. Hacid, A. Legrand, N. Rougier + \newblock Attributing and Referencing (Research) Software: Best Practices and Outlook From Inria + \newblock Computing in Science \& Engineering, 22 (1), pp. 39-52, 2020, ISSN: 1558-366X + + \bibitem{DiCosmo2020a} Roberto Di Cosmo, Morane Gruenpeter, Stefano Zacchiroli + \newblock Referencing Source Code Artifacts: a Separate Concern in Software Citation + \newblock Computing in Science \& Engineering, 2020, ISSN: 1521-9615 + + \bibitem{Abramatic2018} Jean-François Abramatic, Roberto Di Cosmo, Stefano Zacchiroli + \newblock Building the Universal Archive of Source Code + \newblock Communication of the ACM, October 2018 + + \end{thebibliography} + #+END_EXPORT