diff --git a/talks-public/2020-12-01-Master-STL/2020-12-01-Master-STL.org b/talks-public/2020-12-01-Master-STL/2020-12-01-Master-STL.org index 165b4ad..7e81ddf 100644 --- a/talks-public/2020-12-01-Master-STL/2020-12-01-Master-STL.org +++ b/talks-public/2020-12-01-Master-STL/2020-12-01-Master-STL.org @@ -1,297 +1,406 @@ #+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt) #+TITLE: Software Heritage #+SUBTITLE: The universal source code archive #+BEAMER_HEADER: \title{Software Heritage} #+AUTHOR: Morane Gruenpeter #+EMAIL: morane@softwareheritage.org #+BEAMER_HEADER: \date[December 1st, 2020]{December 1st, 2020\\[-1em]} #+BEAMER_HEADER: \title[www.softwareheritage.org]{Software Heritage} #+BEAMER_HEADER: \institute[]{\\\href{mailto:morane@softwareheritage.org}{\tt morane@softwareheritage.org}} #+BEAMER_HEADER: \author[Morane Gruenpeter]{ Morane Gruenpeter\\[1em]% #+BEAMER_HEADER: Software engineer and metadata specialist\\Inria, Software Heritage\\[-1em]} # #+BEAMER_HEADER: \setbeameroption{show notes on second screen} #+BEAMER_HEADER: \setbeameroption{hide notes} #+KEYWORDS: software heritage legacy preservation knowledge mankind technology deposit # # prelude.org contains all the information needed to export the main beamer latex source # use prelude-toc.org to get the table of contents # #+INCLUDE: "../../common/modules/prelude-toc.org" :minlevel 1 #+INCLUDE: "../../common/modules/169.org" # +LaTeX_CLASS_OPTIONS: [aspectratio=169,handout,xcolor=table] #+LATEX_HEADER: \usepackage{bbding} #+LATEX_HEADER: \usepackage{tcolorbox} #+LATEX_HEADER: \DeclareUnicodeCharacter{66D}{\FiveStar} # # If you want to change the title logo it's here # # +BEAMER_HEADER: \titlegraphic{\includegraphics[width=0.7\textwidth]{SWH-logo}} # aspect ratio can be changed, but the slides need to be adapted # - compute a "resizing factor" for the images (macro for picblocks?) # # set the background image # # https://pacoup.com/2011/06/12/list-of-true-169-resolutions/ # #+BEAMER_HEADER: \pgfdeclareimage[height=90mm,width=160mm]{bgd}{swh-world-169.png} #+BEAMER_HEADER: \setbeamertemplate{background}{\pgfuseimage{bgd}} #+LATEX_HEADER: \usepackage{supertabular} #+LATEX_HEADER: \newcommand{\sponsor}[2]{{\bf #1}, #2} #+LATEX_HEADER: \newcommand{\teamster}[2]{{\textcolor{red}{#1}}, #2} * Introduction -# + # BIO ** Short Bio: Morane Gruenpeter #+INCLUDE: "../../common/modules/mg-bio.org::#bio" :only-contents t :minlevel 3 # # One slide motivation + goals #+INCLUDE: "../../common/modules/swh-goals-oneslide-vertical.org::#goals" :minlevel 2 # # Where we are today: endorsement # ** Our principles \hfill iPres 2017 - \url{http://bit.ly/swhpaper} # #+INCLUDE: "../../common/modules/principles-compact.org::#principlesstatus" :only-contents t :minlevel 3 ** Our principles \hfill iPres 2017 - \url{http://bit.ly/swhpaper} :PROPERTIES: :CUSTOM_ID: principlesstatus :END: #+latex: \begin{center} #+ATTR_LATEX: :width .8\linewidth file:SWH-as-foundation-slim.png #+latex: \end{center} #+latex: \footnotesize\vspace{-3mm} # # #+BEAMER: \pause #+BEAMER: \pause #+latex: \centering #+ATTR_LATEX: :width \extblockscale{.8\linewidth} - file:2019-09-archive-growth.png + file:2020-09-08-growth.png ** Growing Support #+INCLUDE: "../../common/modules/support-compact.org::#support" :only-contents t :minlevel 3 * The knowledge is in the source code ! +** Software is all around us +# TODO + ** The knowledge is in the source code! #+INCLUDE: "../../common/modules/source-code-different-short.org::#thesourcecode" :only-contents t :minlevel 3 ** Source code is /special/ *** /Executable/ and /human readable/ knowledge \hfill copyright law /“Programs must be written for people to read, and only incidentally for machines to execute.”/\\ \hfill Harold Abelson #+BEAMER: \pause *** Software /evolves/ over time - projects may last decades - the /development history/ is key to its /understanding/ #+BEAMER: \pause *** Complexity :B_picblock: :PROPERTIES: :BEAMER_env: picblock :BEAMER_OPT: pic=python3-matplotlib.pdf, width=.6\linewidth :END: - /millions/ of lines of code - large /web of dependencies/ + easy to break, difficult to maintain - sophisticated /developer communities/ ** modules/vcs-history.org::#timeline #+INCLUDE: "../../common/modules/vcs-history.org::#timeline" :only-contents t :minlevel 3 ** modules/vcs-history.org::#dvcs-to-merkle #+INCLUDE: "../../common/modules/vcs-history.org::#dvcs-to-merkle" :only-contents t :minlevel 3 ** modules/vcs-history.org::#vcs-explained #+INCLUDE: "../../common/modules/vcs-history.org::#vcs-explained" :only-contents t :minlevel 3 ** modules/vcs-history.org::#adoption #+INCLUDE: "../../common/modules/vcs-history.org::#adoption" :only-contents t :minlevel 3 -* SWH data model and persistent identifiers +* Data model and SWHID: the source code fingerprint # under the hood: automation and storage, the archive in pictures #+INCLUDE: "../../common/modules/under-the-hood-pictures.org::#main" :only-contents t :minlevel 2 ** Under the hood: identifying billions of objects \hfill \url{https://bit.ly/2wOOmyV} #+latex: \begin{center} #+ATTR_LATEX: :width .85\linewidth file:swh-merkle-dag-wide.pdf #+latex: \end{center} #+latex: \footnotesize\vspace{-3mm} ** Our challenges in the PID landscape :PROPERTIES: :CUSTOM_ID: challenges :END: *** Typical properties of systems of identifiers \hfill uniqueness, non ambiguity, persistence, abstraction (opacity) #+BEAMER: \pause *** Key needed properties from our use cases - gratis :: identifiers are free (billions of objects) - integrity :: the associated object cannot be changed (sw dev, /reproducibility/) - no middle man :: no central authority is needed (sw dev, /reproducibility/) #+BEAMER: \pause *** \hfill we could not find systems with both *integrity* and *no middle man* ! +** The SWH-ID schema + # TODO: drawing with swh:1:cnt:xxxxxxx "exploded" and explained + #+LATEX: \centering\forcebeamerstart + #+LATEX: \only<1>{\includegraphics[width=\linewidth]{SWH-ID-1.png}} + #+LATEX: \only<2>{\includegraphics[width=\linewidth]{SWH-ID-2.png}} + #+LATEX: \only<3>{\includegraphics[width=\linewidth]{SWH-ID-3.png}} + #+LATEX: \forcebeamerend + +** Demo time +*** + Let's look at some famous exceprts of source code +#+BEAMER: \pause +*** Apollo 11 source code ([[https://archive.softwareheritage.org/swh:1:cnt:64582b78792cd6c2d67d35da5a11bb80886a6409;origin=https://github.com/virtualagc/virtualagc;lines=245-261/][excerpt]]) :B_block:BMCOL: + :PROPERTIES: + :BEAMER_col: 0.48 + :BEAMER_env: block + :END: + #+LATEX: \includegraphics[width=\linewidth]{apollo-11-cranksilly.png} + # excerpt of routine that asks astronaut to turn around the LEM +#+BEAMER: \pause +*** Quake III source code ([[https://archive.softwareheritage.org/swh:1:cnt:bb0faf6919fc60636b2696f32ec9b3c2adb247fe;origin=https://github.com/id-Software/Quake-III-Arena;lines=549-572/][excerpt]]) :B_block:BMCOL: + :PROPERTIES: + :BEAMER_col: 0.45 + :BEAMER_env: block + :END: + #+LATEX: \includegraphics[width=\linewidth]{quake-carmack-sqrt-1.png} + # smart efficient implementation of 1/sqrt(x) on a CPU without special support +#+BEAMER: \pause +*** :B_ignoreheading: + :PROPERTIES: + :BEAMER_env: ignoreheading + :END: +*** It works! + we have /intrinsic/ identifiers for all 20+ billion objects in the archive # metadata challenge- questions about a software entity and where to find metadata (one slide) #+INCLUDE: "../../common/modules/identifiers-arena.org::#main" :only-contents t :minlevel 2 * The missing piece- the Metadata # metadata challenge- questions about a software entity and where to find metadata (one slide) #+INCLUDE: "../../common/modules/metadata-challenge.org::#main" :only-contents t :minlevel 2 ** The Software Ontology /Touchstone/ *** Software Citation Principles \tiny ( FORCE11's 2015 conference and WG) :B_block: :PROPERTIES: :BEAMER_env: block :BEAMER_opt: :END: - *Importance* : first class citizen in the scholarly ecosystem - *Credit and attribution* : authors, maintainer - *Unique identification*: points to a unique, specific software version (DOI, Git SHA1 hash, etc..) - *Persistence* : identification beyond the lifespan of the software (swh-id) - *Accessibility*: url, publisher - *Specificity* : version, environment # metadata landscape (one decomposed slide) #+INCLUDE: "../../common/modules/metadata-landscape.org::#main" :only-contents t :minlevel 2 ** Software Metadata Terms *** identify :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.2 :BEAMER_env: block :END: - identifier - title - authors - version - type - origin source #+BEAMER: \pause *** execute :B_block:BMCOL: :PROPERTIES: :BEAMER_opt: :BEAMER_env: block :BEAMER_col: 0.2 :END: - link to a compiled version - repository - compiler - environment - examples #+BEAMER: \pause *** classify :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.2 :BEAMER_env: block :END: - description - keywords - in/out data - references - algorithms - docs url #+BEAMER: \pause *** administrate :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.2 :BEAMER_env: block :END: - contact - authorship - funders - license - editor (publisher) - dates - status ** Much more complex than it seems *** Software is complex - Structure :: monolithic/composite; self-contained/external dependencies - Lifetime :: one-shot/long term - Community :: one man/one team/distributed community - Authorship :: complex set of roles - Authority :: institutions/organizations/communities/single person #+BEAMER: \pause *** Various granularities - Exact status of the source code :: for reproducibility, e.g. #+latex: \emph{``you can find at \href{https://archive.softwareheritage.org/swh:1:cnt:cdf19c4487c43c76f3612557d4dc61f9131790a4;lines=146-187/}{swh:1:cnt:cdf19c4487c43c76f3612557d4dc61f9131790a4;lines=146-187} the core algorithm used in this article''} - (Major) release :: \emph{``This functionality is available in OCaml version 4''} - Project :: \emph{``Inria has created OCaml and Scikit-Learn''}. * Software Source code as a research output +** Software is a /forgotten/ pillar of Open Science +*** Lack of recognition + :PROPERTIES: + :BEAMER_env: block + :BEAMER_col: 0.48 + :END: + not (yet) a first class citizen + - in the EOSC plan + - in the scholarly world + + + #+BEGIN_QUOTE + Sometimes, if you don't have the software, you don't have the data + + \mbox{}\hfill Christine Borgman, Paris, 2018 + #+END_QUOTE + + +*** + :PROPERTIES: + :BEAMER_COL: .5 + :END: + #+latex: \begin{center} + #+ATTR_LATEX: :width 0.9\linewidth + file:preservation_triangle_color.png + #+latex: \end{center} +#+BEAMER: \pause +*** Reproducibility is the key :B_picblock: + :PROPERTIES: + :BEAMER_opt: pic=Karl_Popper, leftpic=true, width=.16\linewidth + :BEAMER_env: picblock + :END: +#+latex: \begin{quote} + non-reproducible single occurrences are of no significance to science\\ + \\ + \mbox{} \hfill \scriptsize Karl Popper, \emph{The Logic of Scientific Discovery}, 1934 +#+latex: \end{quote} + +** What is at stake \hfill in increasing order of difficulty +\vspace{-7pt} +*** Archival + Research software artifacts must be properly *archived*\\ + \hfill make it sure we can /retrieve/ them (/reproducibility/) +#+BEAMER: \pause +*** Identification + Research software artifacts must be properly *referenced*\\ + \hfill make it sure we can /identify/ them (/reproducibility/) +#+BEAMER: \pause +*** Metadata + Research software artifacts must be properly *described*\\ + \hfill make it easy to /discover/ them (/visibility/) +#+BEAMER: \pause +*** Citation + Research software artifacts must be properly *cited* /(not the same as referenced!)/\\ + \hfill to give /credit/ to authors (/evaluation/!) + + +* The software deposit - a first class research output + # reproducibility and scientific knowledge pillars (one slide) #+INCLUDE: "../../common/modules/swh-scientific-reproducibility.org::#main" :only-contents t :minlevel 2 -# -# scientific software (deposit) use-case (one slide) -#+INCLUDE: "../../common/modules/swh-scientific-deposit.org::#main" :only-contents t :minlevel 2 + + + +** The research software (deposit) use case + :PROPERTIES: + :CUSTOM_ID: hal + :END: +*** the deposit workflow + :PROPERTIES: + :BEAMER_COL: .5 + :END: + #+latex: \begin{center} + #+ATTR_LATEX: :width \linewidth + file:deposit-communication-with-PID.png + #+latex: \end{center} +#+LATEX: \pause # scientific software (save code now) use-case (three slides) #+INCLUDE: "../../common/modules/swh-scientific-preservation.org::#main" :only-contents t :minlevel 2 * Conclusion ** Research Software Engineer tips *** Use a forge for your academic and personal projects \hfill Github, Gitlab or Bitbucket are the best way to create your *source code cv* #+BEAMER: \pause *** Put in your projects metadata files \hfill *README*, *LICENSE*, *AUTHORS* and *codemeta.json* to describe your project #+BEAMER: \pause *** Archive your projects on SWH \hfill Use the *Save Code Now* feature #+BEAMER: \pause *** Contribute to other projects \hfill When you contribute you learn how to *read code* #+BEAMER: \pause *** Ask \hfill Don't be afraid to ask on an *issue*, *mailing list* or *irc channel* (or your teachers) ** Come in, we're open! #+BEGIN_EXPORT latex \begin{center} \includegraphics[width=.5\linewidth]{SWH-logo.pdf} \end{center} \begin{center} \vfill {\Large Thank you! Any questions?} \end{center} #+END_EXPORT *** Join us on https://forge.softwareheritage.org/ :B_block: :PROPERTIES: :BEAMER_env: block :END: #+BEGIN_EXPORT latex \begin{thebibliography}{Foo Bar, 1969} \footnotesize \bibitem{Abramatic2018} Jean-François Abramatic, Roberto Di Cosmo, Stefano Zacchiroli\newblock \emph{Building the Universal Archive of Source Code}, Communications of the ACM, October 2018 \bibitem{DiCosmo2018} Roberto Di Cosmo, Morane Gruenpeter, Stefano Zacchiroli\newblock \emph{Identifiers for Digital Objects: the Case of Software Source Code Preservation}, iPRES 2018: Intl. Conf. on Digital Preservation \end{thebibliography} #+END_EXPORT *** contact: morane@softwareheritage.org