diff --git a/talks-public/2020-12-10-zbmed/2020-12-10-zbmed.org b/talks-public/2020-12-10-zbmed/2020-12-10-zbmed.org new file mode 100644 index 0000000..0078d1f --- /dev/null +++ b/talks-public/2020-12-10-zbmed/2020-12-10-zbmed.org @@ -0,0 +1,609 @@ +#+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt) +#+TITLE: Software Heritage +#+SUBTITLE: Archiving and Referencing all the source code towards recognizing software in academia +#+BEAMER_HEADER: \title{Software Heritage at ZB MED colloquium} +#+AUTHOR: Morane Gruenpeter +#+EMAIL: morane@softwareheritage.org +#+BEAMER_HEADER: \date[December 10th, 2020]{December 10th, 2020\\[-1em]} +#+BEAMER_HEADER: \title[www.softwareheritage.org]{Software Heritage at ZB MED colloquium} +#+BEAMER_HEADER: \institute[]{\\\href{mailto:morane@softwareheritage.org}{\tt morane@softwareheritage.org}} +#+BEAMER_HEADER: \author[Morane Gruenpeter]{ Morane Gruenpeter\\[1em]% +#+BEAMER_HEADER: Software engineer and metadata specialist\\Inria, Software Heritage\\[-1em]} +# #+BEAMER_HEADER: \setbeameroption{show notes on second screen} +#+BEAMER_HEADER: \setbeameroption{hide notes} +#+KEYWORDS: software heritage legacy preservation knowledge mankind technology deposit + +# +# prelude.org contains all the information needed to export the main beamer latex source +# use prelude-toc.org to get the table of contents +# + +#+INCLUDE: "../../common/modules/prelude-toc.org" :minlevel 1 + + +#+INCLUDE: "../../common/modules/169.org" + +# +LaTeX_CLASS_OPTIONS: [aspectratio=169,handout,xcolor=table] +#+LATEX_HEADER: \usepackage{bbding} +#+LATEX_HEADER: \usepackage{tcolorbox} +#+LATEX_HEADER: \DeclareUnicodeCharacter{66D}{\FiveStar} + + +# +# If you want to change the title logo it's here +# +# +BEAMER_HEADER: \titlegraphic{\includegraphics[width=0.7\textwidth]{SWH-logo}} + +# aspect ratio can be changed, but the slides need to be adapted +# - compute a "resizing factor" for the images (macro for picblocks?) +# +# set the background image +# +# https://pacoup.com/2011/06/12/list-of-true-169-resolutions/ +# +#+BEAMER_HEADER: \pgfdeclareimage[height=90mm,width=160mm]{bgd}{swh-world-169.png} +#+BEAMER_HEADER: \setbeamertemplate{background}{\pgfuseimage{bgd}} + +#+LATEX_HEADER: \usepackage{supertabular} +#+LATEX_HEADER: \newcommand{\sponsor}[2]{{\bf #1}, #2} +#+LATEX_HEADER: \newcommand{\teamster}[2]{{\textcolor{red}{#1}}, #2} + +* Introduction + +# BIO +** Short Bio: Morane Gruenpeter +#+INCLUDE: "../../common/modules/mg-bio.org::#bio" :only-contents t :minlevel 3 + +** Living in Paris +#+latex: \begin{center} +#+ATTR_LATEX: :width .7\linewidth +file:paris-seine.jpg +#+latex: \end{center} + +** Software is all around us +*** Apollo 11 Guidance Computer (~60.000 lines), 1969 + #+latex: \begin{minipage}{.25\linewidth} + #+latex: \begin{flushleft} + #+ATTR_LATEX: :width \extblockscale{.8\linewidth} + file:Margaret_Hamilton.jpg + #+latex: \end{flushleft} + #+latex: \end{minipage} + #+latex: \begin{minipage}{.7\linewidth} + #+latex: \begin{flushright} + #+latex: "When I first got into it, nobody knew what it was that we were doing. It was like the Wild West." + #+latex: \hfill Margaret Hamilton + #+latex: \end{flushright} + #+latex: \end{minipage} + + +*** The World Wide Web, 1989, at CERN on a NeXT machine + #+latex: \begin{minipage}{.65\linewidth} + #+latex: \begin{flushleft} + #+latex:“When somebody has learned how to program a computer … + #+latex: You're joining a group of people who can do incredible things. + #+latex: They can make the computer do anything they can imagine.” + #+latex: \end{flushleft} + #+latex: \end{minipage} + #+latex: \begin{minipage}{.3\linewidth} + #+latex: \begin{flushright} + #+ATTR_LATEX: :width \extblockscale{.95\linewidth} + file:tim_berners_lee.jpg + #+latex: \end{flushright} + #+latex: \end{minipage} + + \hfill From An Insight, An Idea with Tim Berners-Lee (2013) + + +** Software terminology +#+latex: \begin{center} \huge{What is software ?} \end{center} +#+BEAMER: \pause + #+ATTR_LATEX: :width \extblockscale{.85\linewidth} + file:software-magic.jpg + +Image taken from *[[https://www.reddit.com/r/ProgrammerHumor/comments/70fump/programming_is_magic/][from reddit - ProgrammerHumor]]* +** Software definition +*** Encyclopædia Britannica + “Software, instructions that tell a computer what to do. + Software comprises the entire set of programs, procedures, and routines + associated with the operation of a computer system. + The term was coined to differentiate these instructions + from hardware—i.e., the physical components of a computer system.” + \hfill *[[https://www.britannica.com/technology/software][link]]* +#+BEAMER: \pause +*** Software as a concept :B_block:BMCOL: + :PROPERTIES: + :BEAMER_col: 0.5 + :BEAMER_env: block + :END: + - software project / entity +#+BEAMER: \pause + - the creators and the community around it +#+BEAMER: \pause + - the software idea / algorithms / solutions +#+BEAMER: \pause + +*** Software artifact :B_block:BMCOL: + :PROPERTIES: + :BEAMER_col: 0.45 + :BEAMER_env: block + :END: + - the executable (or binary) of each version for a specific environment +#+BEAMER: \pause + - the *software source code* for each revision + + +** Much more complex than it seems +*** Software is complex + - Structure :: monolithic/composite; self-contained/external dependencies + - Lifetime :: one-shot/long term + - Community :: one man/one team/distributed community + - Authorship :: complex set of roles + - Authority :: institutions/organizations/communities/single person +#+BEAMER: \pause +*** Various granularities + - Exact status of the source code :: for reproducibility, e.g. +#+latex: \emph{``you can find at \href{https://archive.softwareheritage.org/swh:1:cnt:cdf19c4487c43c76f3612557d4dc61f9131790a4;lines=146-187/}{swh:1:cnt:cdf19c4487c43c76f3612557d4dc61f9131790a4;lines=146-187} the core algorithm used in this article''} + + - (Major) release :: \emph{``This functionality is available in OCaml version 4''} + + - Project :: \emph{``Inria has created OCaml and Scikit-Learn''}. + +** Is this /software/? +#+latex: \begin{center} +#+ATTR_LATEX: :width .38\linewidth +file:this-is-not-a-pipe.png +#+latex: \end{center} + +\hfill What about **/software source code/** ? + +* The knowledge is in the source code ! + +** The knowledge is in the source code! +#+INCLUDE: "../../common/modules/source-code-different-short.org::#thesourcecode" :only-contents t :minlevel 3 +** Source code is /special/ +*** /Executable/ and /human readable/ knowledge \hfill copyright law + /“Programs must be written for people to read, and only incidentally for machines to execute.”/\\ + \hfill Harold Abelson +#+BEAMER: \pause +*** Software /evolves/ over time + - projects may last decades + - the /development history/ is key to its /understanding/ +#+BEAMER: \pause +*** Complexity :B_picblock: + :PROPERTIES: + :BEAMER_env: picblock + :BEAMER_OPT: pic=python3-matplotlib.pdf, width=.6\linewidth + :END: + - /millions/ of lines of code + - large /web of dependencies/ + + easy to break, difficult to maintain + - sophisticated /developer communities/ + +** Software Source Code human readable and executable knowledge +file:NOLI_SE_TANGERE.png + +** Version Control System timeline +#+INCLUDE: "../../common/modules/vcs-history.org::#timeline" :only-contents t :minlevel 3 + + +* Software Heritage: the universal source code archive +# +# One slide motivation + goals +#+INCLUDE: "../../common/modules/swh-goals-oneslide-vertical.org::#goals" :minlevel 2 +# + +# Where we are today: endorsement +# ** Our principles \hfill iPres 2017 - \url{http://bit.ly/swhpaper} +# #+INCLUDE: "../../common/modules/principles-compact.org::#principlesstatus" :only-contents t :minlevel 3 + +** Our principles \hfill iPres 2017 - \url{http://bit.ly/swhpaper} + :PROPERTIES: + :CUSTOM_ID: principlesstatus + :END: +#+latex: \begin{center} +#+ATTR_LATEX: :width .8\linewidth +file:SWH-as-foundation-slim.png +#+latex: \end{center} +#+latex: \footnotesize\vspace{-3mm} + # + # #+BEAMER: \pause + + #+BEAMER: \pause + #+latex: \centering + #+ATTR_LATEX: :width \extblockscale{.8\linewidth} + file:2020-09-08-growth.png + +** Growing Support +#+INCLUDE: "../../common/modules/support-compact.org::#support" :only-contents t :minlevel 3 + +* Data model and SWHID: the source code fingerprint +# under the hood: automation and storage, the archive in pictures +#+INCLUDE: "../../common/modules/under-the-hood-pictures.org::#main" :only-contents t :minlevel 2 + + +** Our challenges in the PID landscape + :PROPERTIES: + :CUSTOM_ID: challenges + :END: +*** Typical properties of systems of identifiers + \hfill uniqueness, non ambiguity, persistence, abstraction (opacity) +#+BEAMER: \pause +*** Key needed properties from our use cases + - gratis :: identifiers are free (billions of objects) + - integrity :: the associated object cannot be changed (sw dev, /reproducibility/) + - no middle man :: no central authority is needed (sw dev, /reproducibility/) +#+BEAMER: \pause +*** + \hfill we could not find systems with both *integrity* and *no middle man* ! +*** + \hfill Intrinsic, decentralised, cryptographically strong identifiers = SWHIDs +** The SWH-ID schema + # TODO: drawing with swh:1:cnt:xxxxxxx "exploded" and explained + #+LATEX: \centering\forcebeamerstart + #+LATEX: \only<1>{\includegraphics[width=\linewidth]{SWH-ID-1.png}} + #+LATEX: \only<2>{\includegraphics[width=\linewidth]{SWH-ID-2.png}} + #+LATEX: \only<3>{\includegraphics[width=\linewidth]{SWH-ID-3.png}} + #+LATEX: \forcebeamerend + +** Demo time +*** + Let's look at some famous exceprts of source code +#+BEAMER: \pause +*** Apollo 11 source code ([[https://archive.softwareheritage.org/swh:1:cnt:64582b78792cd6c2d67d35da5a11bb80886a6409;origin=https://github.com/virtualagc/virtualagc;lines=245-261/][excerpt]]) :B_block:BMCOL: + :PROPERTIES: + :BEAMER_col: 0.48 + :BEAMER_env: block + :END: + #+LATEX: \includegraphics[width=\linewidth]{apollo-11-cranksilly.png} + # excerpt of routine that asks astronaut to turn around the LEM +#+BEAMER: \pause +*** Quake III source code ([[https://archive.softwareheritage.org/swh:1:cnt:bb0faf6919fc60636b2696f32ec9b3c2adb247fe;origin=https://github.com/id-Software/Quake-III-Arena;lines=549-572/][excerpt]]) :B_block:BMCOL: + :PROPERTIES: + :BEAMER_col: 0.45 + :BEAMER_env: block + :END: + #+LATEX: \includegraphics[width=\linewidth]{quake-carmack-sqrt-1.png} + # smart efficient implementation of 1/sqrt(x) on a CPU without special support +#+BEAMER: \pause +*** :B_ignoreheading: + :PROPERTIES: + :BEAMER_env: ignoreheading + :END: +*** It works! + we have /intrinsic/ identifiers for all 20+ billion objects in the archive + +* Recognizing software as a research output + +** Software is a /forgotten/ pillar of Open Science +*** Lack of recognition + :PROPERTIES: + :BEAMER_env: block + :BEAMER_col: 0.48 + :END: + not (yet) a first class output + - in the EOSC plan + - in the scholarly world + + + #+BEGIN_QUOTE + Sometimes, if you don't have the software, you don't have the data + + \mbox{}\hfill Christine Borgman, Paris, 2018 + #+END_QUOTE + + +*** + :PROPERTIES: + :BEAMER_COL: .5 + :END: + #+latex: \begin{center} + #+ATTR_LATEX: :width 0.9\linewidth + file:preservation_triangle_color.png + #+latex: \end{center} +#+BEAMER: \pause +*** Reproducibility is the key :B_picblock: + :PROPERTIES: + :BEAMER_opt: pic=Karl_Popper, leftpic=true, width=.16\linewidth + :BEAMER_env: picblock + :END: +#+latex: \begin{quote} + non-reproducible single occurrences are of no significance to science\\ + \\ + \mbox{} \hfill \scriptsize Karl Popper, \emph{The Logic of Scientific Discovery}, 1934 +#+latex: \end{quote} + +** A plurality of needs +*** Researchers + - **archive and reference** software used and created in articles + - **find** useful software + - get **credit** for developed software + - verify/reproduce/improve **results** +#+Beamer: \pause +*** Laboratories/teams + - **track** software contributions + - **produce** reports + - **maintain** web page +#+Beamer: \pause +*** Research Organization + - know its software assets for **technology transfer**, **impact metrics** and **strategy**. + +** Software in research has different roles +*** Multiple facets, it can be seen as: + - a tool + - a research outcome or result + - the object of research +#+Beamer: \pause + +*** By identifying the software role +\hfill we can decide how to /treat/ it + +** What is at stake \hfill in increasing order of difficulty +\vspace{-7pt} +*** Archival + Research software artifacts must be properly *archived*\\ + \hfill make it sure we can /retrieve/ them (/reproducibility/) +#+BEAMER: \pause +*** Identification + Research software artifacts must be properly *referenced*\\ + \hfill make it sure we can /identify/ them (/reproducibility/) +#+BEAMER: \pause +*** Metadata + Research software artifacts must be properly *described*\\ + \hfill make it easy to /discover/ them (/visibility/) +#+BEAMER: \pause +*** Citation + Research software artifacts must be properly *cited* /(not the same as referenced!)/\\ + \hfill to give /credit/ to authors (/evaluation/!) + + + +** The research software (deposit) use case + :PROPERTIES: + :CUSTOM_ID: hal + :END: +*** the deposit workflow + :PROPERTIES: + :BEAMER_COL: .5 + :END: + #+latex: \begin{center} + #+ATTR_LATEX: :width \linewidth + file:deposit-communication-with-PID.png + #+latex: \end{center} +#+LATEX: \pause + +*** Deposit software in HAL \hfill [[http://hal.inria.fr/hal-01738741][poster]] :B_picblock: + :PROPERTIES: + :BEAMER_COL: .5 + :BEAMER_env: block + :END: + *\hspace{1em}Generic mechanism:* + - SWORD based + - review process + - versioning + +#+BEAMER: \pause + *\hspace{1em} How to do it:* \hfill ([[http://bit.ly/swhdeposithalen][/guide/]]) + - deposit .zip or .tar.gz file with metadata + +#+BEAMER: \pause + *\hspace{1em} Timeline:* + - /March 2018/: test phase on *HAL-Inria* + - /September 2018/: open to all *HAL* + - /December 2019/: + - 80 complete source code deposits + - 98 software records + +** Submit your source code \hfill ([[http://bit.ly/swhdeposithalen][/guide/]]) +#+latex: \begin{center} +#+ATTR_LATEX: :width \linewidth +file:HAL-form-IDCC.png +#+latex: \end{center} + +** The deposit view +#+latex: \begin{center} +#+ATTR_LATEX: :width .8\linewidth +file:CGAL-3D.png +#+latex: \end{center} + +** Reference vs. citation +*** Credit & Attribution + :PROPERTIES: + :BEAMER_col: 0.33 + :BEAMER_env: block + :END: + - a metadata record + - all authors & contributors +#+BEAMER: \pause + +*** Reuse & Reproducibility + :PROPERTIES: + :BEAMER_col: 0.33 + :BEAMER_env: block + :END: + - a specific artifact + - with complementary information (docs) +#+BEAMER: \pause + +*** Archive & Index + :PROPERTIES: + :BEAMER_col: 0.33 + :BEAMER_env: block + :END: + - metadata record (HAL) + - artifact itself (SWH) + \hfill connect the dots... + +#+BEAMER: \pause +*** :B_ignoreheading: + :PROPERTIES: + :BEAMER_env: ignoreheading + :END: +#+latex: \begin{center} +#+ATTR_LATEX: :width 0.7\linewidth +file:citation-format-IDCC.png +#+latex: \end{center} + + + + +#+LATEX: \pause + +# scientific software (save code now) use-case (three slides) +#+INCLUDE: "../../common/modules/swh-scientific-preservation.org::#main" :only-contents t :minlevel 2 + + + + +* The missing piece- the Metadata +** The metadata challenge - Software ontologies + :PROPERTIES: + :CUSTOM_ID: thesourcecode + :END: + +#+BEGIN_QUOTE + “Ontologies are agreements, made in a social context, to accomplish some objectives. + It's important to understand those objectives, and be guided by them.\"\\ + \hfill T. Gruber, The Pragmatics of Ontology, 2003 +#+END_QUOTE + +#+Beamer: \pause +*** What do we want to describe? + :PROPERTIES: + :BEAMER_env: block + :BEAMER_act: +- + :END: + + a software project? + + a software artifact? a collection of artifacts? + + With what terms or vocabulary? + +** The Software Ontology /Touchstone/ +*** Software Citation Principles \tiny (Smith et al. 2016) :B_block: + :PROPERTIES: + :BEAMER_env: block + :BEAMER_opt: + :END: + - *Importance* : first class citizen in the scholarly ecosystem + - *Credit and attribution* : authors, maintainer + - *Unique identification*: points to + a unique, specific software version (DOI, Git SHA1 hash, etc..) + - *Persistence* : identification beyond the lifespan of the software (swh-id) + - *Accessibility*: url, publisher + - *Specificity* : version, environment + +# metadata landscape (one decomposed slide) +#+INCLUDE: "../../common/modules/metadata-landscape.org::#main" :only-contents t :minlevel 2 + + +** Software Metadata Terms \hfill [[https://codemeta.github.io/terms/][CodeMeta]] +*** identify :B_block:BMCOL: + :PROPERTIES: + :BEAMER_col: 0.4 + :BEAMER_env: block + :END: + - identifier + - name + - author(s) + - contributor(s) + - version + - applicationCategory + - codeRepository +#+BEAMER: \pause + +*** administrate :B_block:BMCOL: + :PROPERTIES: + :BEAMER_col: 0.5 + :BEAMER_env: block + :END: + - maintainer (contact*) + - citation + - funder(s) + - license + - editor / publisher + - dates (created, modified, published) + - developmentStatus + +** Software Metadata Terms \hfill [[https://codemeta.github.io/terms/][CodeMeta]] +*** execute :B_block:BMCOL: + :PROPERTIES: + :BEAMER_opt: + :BEAMER_env: block + :BEAMER_col: 0.45 + :END: + - buildInstructions + - issueTracker + - operatingSystem + - softwareRequirements + - runtimePlatform + - downloadUrl + - (memory, procesor, storage) +#+BEAMER: \pause +*** classify :B_block:BMCOL: + :PROPERTIES: + :BEAMER_col: 0.45 + :BEAMER_env: block + :END: + - description + - keywords + - supportingData + - referencePublication + - algorithms* + - readme (docs*) + + +** The CodeMeta Generator \hfill [[https://codemeta.github.io/codemeta-generator/][link to online tool]] +#+latex: \begin{center} +#+ATTR_LATEX: :width .7\linewidth +file:CodeMetaGenerator.png +#+latex: \end{center} + + +* Conclusion + +** Come in, we're open! +*** Software Heritage :B_block:BMCOL: + :PROPERTIES: + :BEAMER_opt: + :BEAMER_env: block + :BEAMER_col: 0.4 + :END: + - universal source code archive + - intrinsic identifiers (SWHIDs) + - open, non profit, long term + - infrastructure for Open Science +#+BEAMER: \pause +*** You can help improve science! :B_block:BMCOL: + :PROPERTIES: + :BEAMER_col: 0.5 + :BEAMER_env: block + :END: + - use SWH and save /relevant/ source code + - build on SWH (see swmath.org and ipol.im) + - contribute to SWH- it is /open source/ + - spread the word + +** +#+BEGIN_EXPORT latex + \begin{center} + \includegraphics[width=.5\linewidth]{SWH-logo.pdf} + \end{center} + \begin{center} + \vfill + {\Large Thank you! Any questions?} + \end{center} +#+END_EXPORT +*** contact: morane@softwareheritage.org :B_block: + :PROPERTIES: + :BEAMER_env: block + :END: + #+BEGIN_EXPORT latex + \begin{thebibliography}{Foo Bar, 1969} + \footnotesize + \bibitem{DiCosmo2020b} P. Alliez, R. Di Cosmo, B. Guedj, A. Girault, M. Hacid, A. Legrand, N. Rougier + \newblock Attributing and Referencing (Research) Software: Best Practices and Outlook From Inria + \newblock Computing in Science \& Engineering, 22 (1), pp. 39-52, 2020, ISSN: 1558-366X + \bibitem{DiCosmo2020a} Roberto Di Cosmo, Morane Gruenpeter, Stefano Zacchiroli + \newblock Referencing Source Code Artifacts: a Separate Concern in Software Citation + \newblock Computing in Science \& Engineering, 2020, ISSN: 1521-9615 + \end{thebibliography} + #+END_EXPORT + diff --git a/talks-public/2020-12-10-zbmed/METADATA b/talks-public/2020-12-10-zbmed/METADATA new file mode 100644 index 0000000..f2546c4 --- /dev/null +++ b/talks-public/2020-12-10-zbmed/METADATA @@ -0,0 +1,22 @@ +Title: + Archiving and Referencing all the source code towards recognizing software in academia + +Abstract: + +Software source code is everywhere, with tens of millions of developers +worldwide, and it is of paramount importance to provide a universal +archive and reference system for all its applications. Software Heritage +has taken over this task. + +Software is, together with articles and data, a key ingredient of academic +research, in all fields, hence preservation of, and access to the source code of +Computer programs is a pillar of Open Science, and a necessary step to +understand, verify, adapt and reuse the tools that we produce and use in the +scholarly world. + +In this talk, we will present Software Heritage, a universal long term archive +specifically designed for software source code that provides intrinsic +identifiers for all the artifacts it stores, at all granularities, down to the +level of the line of code. Software Heritage is now providing the infrastructure +for depositing and referencing software source code, in collaboration with +national and international open access portals. diff --git a/talks-public/2020-12-10-zbmed/Makefile b/talks-public/2020-12-10-zbmed/Makefile new file mode 100644 index 0000000..68fbee7 --- /dev/null +++ b/talks-public/2020-12-10-zbmed/Makefile @@ -0,0 +1 @@ +include ../Makefile.slides