diff --git a/common/logos/SWH-logo+EOSC-Pillar.pdf b/common/logos/SWH-logo+EOSC-Pillar.pdf new file mode 100644 index 0000000..7440615 Binary files /dev/null and b/common/logos/SWH-logo+EOSC-Pillar.pdf differ diff --git a/common/logos/SWH-logo+EOSC-Pillar.png b/common/logos/SWH-logo+EOSC-Pillar.png new file mode 100644 index 0000000..c7729e8 Binary files /dev/null and b/common/logos/SWH-logo+EOSC-Pillar.png differ diff --git a/common/logos/SWH-logo+EOSC-Pillar.svg b/common/logos/SWH-logo+EOSC-Pillar.svg new file mode 100644 index 0000000..a63b87c --- /dev/null +++ b/common/logos/SWH-logo+EOSC-Pillar.svg @@ -0,0 +1,1441 @@ + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/talks-public/2020-12-17-eosc-pillar/2020-12-17-EOSC-Pillar.org b/talks-public/2020-12-17-eosc-pillar/2020-12-17-EOSC-Pillar.org index 5a8552f..77904e7 100644 --- a/talks-public/2020-12-17-eosc-pillar/2020-12-17-EOSC-Pillar.org +++ b/talks-public/2020-12-17-eosc-pillar/2020-12-17-EOSC-Pillar.org @@ -1,567 +1,587 @@ #+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt) #+TITLE: Software Heritage #+SUBTITLE: Archiving and Referencing all the source code #+BEAMER_HEADER: \title{Software Heritage at EOSC-Pillar F2F meeting} #+AUTHOR: David Douard #+EMAIL: david.douard@softwareheritage.org #+BEAMER_HEADER: \date[December 17th, 2020]{December 17th, 2020\\[-1em]} #+BEAMER_HEADER: \title[www.softwareheritage.org]{Software Heritage at EOSC-Pillar F2F meeting} #+BEAMER_HEADER: \institute[]{\\\href{mailto:david.douard@softwareheritage.org}{\tt david.douard@softwareheritage.org}} #+BEAMER_HEADER: \author[David Douard]{ David Douard\\[1em]% #+BEAMER_HEADER: Software engineer\\Inria, Software Heritage\\[-1em]} # #+BEAMER_HEADER: \setbeameroption{show notes on second screen} #+BEAMER_HEADER: \setbeameroption{hide notes} #+KEYWORDS: software heritage legacy preservation knowledge mankind technology deposit # # prelude.org contains all the information needed to export the main beamer latex source # use prelude-toc.org to get the table of contents # #+INCLUDE: "../../common/modules/prelude-toc.org" :minlevel 1 #+INCLUDE: "../../common/modules/169.org" # +LaTeX_CLASS_OPTIONS: [aspectratio=169,handout,xcolor=table] #+LATEX_HEADER: \usepackage{bbding} #+LATEX_HEADER: \usepackage{tcolorbox} #+LATEX_HEADER: \DeclareUnicodeCharacter{66D}{\FiveStar} # # If you want to change the title logo it's here # -# +BEAMER_HEADER: \titlegraphic{\includegraphics[width=0.7\textwidth]{SWH-logo}} +#+BEAMER_HEADER: \titlegraphic{\includegraphics[width=0.7\textwidth]{SWH-logo+EOSC-Pillar}} # aspect ratio can be changed, but the slides need to be adapted # - compute a "resizing factor" for the images (macro for picblocks?) # # set the background image # # https://pacoup.com/2011/06/12/list-of-true-169-resolutions/ # #+BEAMER_HEADER: \pgfdeclareimage[height=90mm,width=160mm]{bgd}{swh-world-169.png} #+BEAMER_HEADER: \setbeamertemplate{background}{\pgfuseimage{bgd}} #+LATEX_HEADER: \usepackage{supertabular} #+LATEX_HEADER: \newcommand{\sponsor}[2]{{\bf #1}, #2} #+LATEX_HEADER: \newcommand{\teamster}[2]{{\textcolor{red}{#1}}, #2} * Introduction -# BIO ** Software is all around us *** Apollo 11 Guidance Computer (~60.000 lines), 1969 #+latex: \begin{minipage}{.25\linewidth} #+latex: \begin{flushleft} #+ATTR_LATEX: :width \extblockscale{.8\linewidth} file:Margaret_Hamilton.jpg #+latex: \end{flushleft} #+latex: \end{minipage} #+latex: \begin{minipage}{.7\linewidth} #+latex: \begin{flushright} #+latex: "When I first got into it, nobody knew what it was that we were doing. It was like the Wild West." #+latex: \hfill Margaret Hamilton #+latex: \end{flushright} #+latex: \end{minipage} #+BEAMER: \pause *** The World Wide Web, 1989, at CERN on a NeXT machine #+latex: \begin{minipage}{.65\linewidth} #+latex: \begin{flushleft} #+latex:“When somebody has learned how to program a computer … #+latex: You're joining a group of people who can do incredible things. #+latex: They can make the computer do anything they can imagine.” #+latex: \end{flushleft} #+latex: \end{minipage} #+latex: \begin{minipage}{.3\linewidth} #+latex: \begin{flushright} #+ATTR_LATEX: :width \extblockscale{.95\linewidth} file:tim_berners_lee.jpg #+latex: \end{flushright} #+latex: \end{minipage} \hfill From An Insight, An Idea with Tim Berners-Lee (2013) ** Software is all around EOSC-Pillar *** Open Science cannot exists without :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_env: block :END: - open source software - accessible source code - identifiable source code - FAIR source code *** Source is everywhere in EOSC :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.45 :BEAMER_env: block :END: - data production - data processing - HPC / compute as a service #+BEAMER: \pause *** Is every piece of this software - clearly identified? #+BEAMER: \pause - accessible? #+BEAMER: \pause -- today and in the future? +- today? +#+BEAMER: \pause +- in the future? ** Software definition *** Encyclopædia Britannica “Software, instructions that tell a computer what to do. Software comprises the entire set of programs, procedures, and routines associated with the operation of a computer system. The term was coined to differentiate these instructions from hardware—i.e., the physical components of a computer system.” \hfill *[[https://www.britannica.com/technology/software][link]]* #+BEAMER: \pause *** Software as a concept :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_env: block :END: - software project / entity #+BEAMER: \pause - the creators and the community around it #+BEAMER: \pause - the software idea / algorithms / solutions #+BEAMER: \pause *** Software artifact :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.45 :BEAMER_env: block :END: - the executable (or binary) of each version for a specific environment #+BEAMER: \pause - the *software source code* for each revision ** Much more complex than it seems *** Software is complex - Structure :: monolithic/composite; self-contained/external dependencies - Lifetime :: one-shot/long term - Community :: one man/one team/distributed community - Authorship :: complex set of roles - Authority :: institutions/organizations/communities/single person #+BEAMER: \pause *** Various granularities - Exact status of the source code :: for reproducibility, e.g. #+latex: \emph{``you can find at \href{https://archive.softwareheritage.org/swh:1:cnt:cdf19c4487c43c76f3612557d4dc61f9131790a4;lines=146-187/}{swh:1:cnt:cdf19c4487c43c76f3612557d4dc61f9131790a4;lines=146-187} the core algorithm used in this article''} - (Major) release :: \emph{``This functionality is available in OCaml version 4''} - Project :: \emph{``Inria has created OCaml and Scikit-Learn''}. -** Is this /software/? -#+latex: \begin{center} -#+ATTR_LATEX: :width .38\linewidth -file:this-is-not-a-pipe.png -#+latex: \end{center} - -\hfill What about **/software source code/** ? * The knowledge is in the source code ! ** The knowledge is in the source code! #+INCLUDE: "../../common/modules/source-code-different-short.org::#thesourcecode" :only-contents t :minlevel 3 ** Source code is /special/ *** /Executable/ and /human readable/ knowledge \hfill copyright law /“Programs must be written for people to read, and only incidentally for machines to execute.”/\\ \hfill Harold Abelson #+BEAMER: \pause *** Software /evolves/ over time - projects may last decades - the /development history/ is key to its /understanding/ #+BEAMER: \pause *** Complexity :B_picblock: :PROPERTIES: :BEAMER_env: picblock :BEAMER_OPT: pic=python3-matplotlib.pdf, width=.6\linewidth :END: - /millions/ of lines of code - large /web of dependencies/ + easy to break, difficult to maintain - sophisticated /developer communities/ ** Software Source Code human readable and executable knowledge file:NOLI_SE_TANGERE.png -** Version Control System timeline -#+INCLUDE: "../../common/modules/vcs-history.org::#timeline" :only-contents t :minlevel 3 - - * Software Heritage: the universal source code archive # # One slide motivation + goals #+INCLUDE: "../../common/modules/swh-goals-oneslide-vertical.org::#goals" :minlevel 2 # # Where we are today: endorsement # ** Our principles \hfill iPres 2017 - \url{http://bit.ly/swhpaper} # #+INCLUDE: "../../common/modules/principles-compact.org::#principlesstatus" :only-contents t :minlevel 3 ** Our principles \hfill iPres 2017 - \url{http://bit.ly/swhpaper} :PROPERTIES: :CUSTOM_ID: principlesstatus :END: #+latex: \begin{center} #+ATTR_LATEX: :width .8\linewidth file:SWH-as-foundation-slim.png #+latex: \end{center} #+latex: \footnotesize\vspace{-3mm} # # #+BEAMER: \pause #+BEAMER: \pause #+latex: \centering #+ATTR_LATEX: :width \extblockscale{.8\linewidth} file:2020-09-08-growth.png ** Growing Support #+INCLUDE: "../../common/modules/support-compact.org::#support" :only-contents t :minlevel 3 * Data model and SWHID: the source code fingerprint # under the hood: automation and storage, the archive in pictures #+INCLUDE: "../../common/modules/under-the-hood-pictures.org::#main" :only-contents t :minlevel 2 ** Our challenges in the PID landscape :PROPERTIES: :CUSTOM_ID: challenges :END: *** Typical properties of systems of identifiers \hfill uniqueness, non ambiguity, persistence, abstraction (opacity) #+BEAMER: \pause *** Key needed properties from our use cases - gratis :: identifiers are free (billions of objects) - integrity :: the associated object cannot be changed (sw dev, /reproducibility/) - no middle man :: no central authority is needed (sw dev, /reproducibility/) #+BEAMER: \pause *** \hfill we could not find systems with both *integrity* and *no middle man*! *** \hfill Intrinsic, decentralised, cryptographically strong identifiers = SWHIDs ** The SWHID schema # TODO: drawing with swh:1:cnt:xxxxxxx "exploded" and explained #+LATEX: \centering\forcebeamerstart #+LATEX: \only<1>{\includegraphics[width=\linewidth]{SWH-ID-1.png}} #+LATEX: \only<2>{\includegraphics[width=\linewidth]{SWH-ID-2.png}} #+LATEX: \only<3>{\includegraphics[width=\linewidth]{SWH-ID-3.png}} #+LATEX: \forcebeamerend -** Demo time -*** - Let's look at some famous exceprts of source code -#+BEAMER: \pause -*** Apollo 11 source code ([[https://archive.softwareheritage.org/swh:1:cnt:64582b78792cd6c2d67d35da5a11bb80886a6409;origin=https://github.com/virtualagc/virtualagc;lines=245-261/][excerpt]]) :B_block:BMCOL: - :PROPERTIES: - :BEAMER_col: 0.48 - :BEAMER_env: block - :END: - #+LATEX: \includegraphics[width=\linewidth]{apollo-11-cranksilly.png} +** (Quick) Demo time +*** Apollo 11 source code + #+LATEX: \begin{center} + #+LATEX: \includegraphics[width=0.5\linewidth]{apollo-11-cranksilly.png} + #+LATEX: \end{center} # excerpt of routine that asks astronaut to turn around the LEM #+BEAMER: \pause -*** Quake III source code ([[https://archive.softwareheritage.org/swh:1:cnt:bb0faf6919fc60636b2696f32ec9b3c2adb247fe;origin=https://github.com/id-Software/Quake-III-Arena;lines=549-572/][excerpt]]) :B_block:BMCOL: - :PROPERTIES: - :BEAMER_col: 0.45 - :BEAMER_env: block - :END: - #+LATEX: \includegraphics[width=\linewidth]{quake-carmack-sqrt-1.png} - # smart efficient implementation of 1/sqrt(x) on a CPU without special support +*** + [[https://archive.softwareheritage.org/swh:1:cnt:64582b78792cd6c2d67d35da5a11bb80886a6409][swh:1:cnt:64582b78792cd6c2d67d35da5a11bb80886a6409]] #+BEAMER: \pause -*** :B_ignoreheading: - :PROPERTIES: - :BEAMER_env: ignoreheading - :END: -*** It works! - we have /intrinsic/ identifiers for all 20+ billion objects in the archive + [[https://archive.softwareheritage.org/swh:1:cnt:64582b78792cd6c2d67d35da5a11bb80886a6409;origin=https://github.com/virtualagc/virtualagc;lines=245-261/][;origin=https://github.com/virtualagc/virtualagc;lines=245-261]] + * Recognizing software as a research output ** Software is a /forgotten/ pillar of Open Science *** Lack of recognition :PROPERTIES: :BEAMER_env: block :BEAMER_col: 0.48 :END: not (yet) a first class output - in the EOSC plan - in the scholarly world #+BEGIN_QUOTE Sometimes, if you don't have the software, you don't have the data \mbox{}\hfill Christine Borgman, Paris, 2018 #+END_QUOTE *** :PROPERTIES: :BEAMER_COL: .5 :END: #+latex: \begin{center} #+ATTR_LATEX: :width 0.8\linewidth file:preservation_triangle_color.png #+latex: \end{center} #+BEAMER: \pause *** Reproducibility is the key :B_picblock: :PROPERTIES: :BEAMER_opt: pic=Karl_Popper, leftpic=true, width=.16\linewidth :BEAMER_env: picblock :END: #+latex: \begin{quote} non-reproducible single occurrences are of no significance to science\\ \\ \mbox{} \hfill \scriptsize Karl Popper, \emph{The Logic of Scientific Discovery}, 1934 #+latex: \end{quote} ** A plurality of needs *** Researchers - **archive and reference** software used and created in articles - **find** useful software - get **credit** for developed software - verify/reproduce/improve **results** #+Beamer: \pause *** Laboratories/teams - **track** software contributions - **produce** reports - **maintain** web page #+Beamer: \pause *** Research Organization - know its software assets for **technology transfer**, **impact metrics** and **strategy**. ** Software in research has different roles *** Multiple facets, it can be seen as: - a tool - a research outcome or result - the object of research #+Beamer: \pause *** By identifying the software role \hfill we can decide how to /treat/ it ** What is at stake \hfill in increasing order of difficulty \vspace{-7pt} *** Archival Research software artifacts must be properly *archived*\\ \hfill make it sure we can /retrieve/ them (/reproducibility/) #+BEAMER: \pause *** Identification Research software artifacts must be properly *referenced*\\ \hfill make it sure we can /identify/ them (/reproducibility/) #+BEAMER: \pause *** Metadata Research software artifacts must be properly *described*\\ \hfill make it easy to /discover/ them (/visibility/) #+BEAMER: \pause *** Citation Research software artifacts must be properly *cited* /(not the same as referenced!)/\\ \hfill to give /credit/ to authors (/evaluation/!) +* Archiving source code in the Software Heritage Archive + +** Ingestion in SWH + +*** listers and loader + :PROPERTIES: + :BEAMER_COL: .5 + :BEAMER_env: block + :END: + - pull based + - best effort + - reponsibility of SWH +#+LATEX: \pause + +*** save code now + :PROPERTIES: + :BEAMER_COL: .5 + :BEAMER_env: block + :END: + - push based + - accessible to everyone + - (light) curation from SWH +#+LATEX: \pause + +*** deposit +- push based +- authenticated +- responsibility of the user/partner +- comes with metadata -** The research software (deposit) use case +** The deposit + +*** +- a deposited artifact is provided from one of the SWH partners which is + regarded as a *trusted authority*, +- a deposited artifact *requires metadata* properties describing the source code + artifact, +- a deposited artifact has a *codemeta* metadata entry attached to it, +- a deposited artifact has the *same visibility* on the SWH Archive than a + collected repository, +- a deposited artifacts *can be searched* with its provided url property on the + SWH Archive, +- the deposit API uses the *SWORD v2* API, thus requires some tooling to send + deposits to SWH. These tools are provided with this repository. + + +** The deposit :PROPERTIES: :CUSTOM_ID: hal :END: *** the deposit workflow :PROPERTIES: :BEAMER_COL: .5 :END: #+latex: \begin{center} #+ATTR_LATEX: :width \linewidth file:deposit-communication-with-PID.png #+latex: \end{center} #+LATEX: \pause -*** Deposit software in HAL \hfill [[http://hal.inria.fr/hal-01738741][poster]] :B_picblock: +*** Deposit software (in HAL) \hfill [[http://hal.inria.fr/hal-01738741][poster]] :B_picblock: :PROPERTIES: :BEAMER_COL: .5 :BEAMER_env: block :END: *\hspace{1em}Generic mechanism:* - SWORD based - review process - versioning #+BEAMER: \pause *\hspace{1em} How to do it:* \hfill ([[http://bit.ly/swhdeposithalen][/guide/]]) - deposit .zip or .tar.gz file with metadata ** Example: The deposit view in HAL #+latex: \begin{center} #+ATTR_LATEX: :width .8\linewidth file:CGAL-3D.png #+latex: \end{center} -#+LATEX: \pause # scientific software (save code now) use-case (three slides) #+INCLUDE: "../../common/modules/swh-scientific-preservation.org::#main" :only-contents t :minlevel 2 * The missing piece- the Metadata ** The metadata challenge - Software ontologies :PROPERTIES: :CUSTOM_ID: thesourcecode :END: #+BEGIN_QUOTE “Ontologies are agreements, made in a social context, to accomplish some objectives. It's important to understand those objectives, and be guided by them.\"\\ \hfill T. Gruber, The Pragmatics of Ontology, 2003 #+END_QUOTE #+Beamer: \pause *** What do we want to describe? :PROPERTIES: :BEAMER_env: block :BEAMER_act: +- :END: + a software project? + a software artifact? a collection of artifacts? + With what terms or vocabulary? ** The Software Ontology /Touchstone/ *** Software Citation Principles \tiny (Smith et al. 2016) :B_block: :PROPERTIES: :BEAMER_env: block :BEAMER_opt: :END: - *Importance* : first class citizen in the scholarly ecosystem - *Credit and attribution* : authors, maintainer - *Unique identification*: points to a unique, specific software version (DOI, Git SHA1 hash, etc..) - *Persistence* : identification beyond the lifespan of the software (swh-id) - *Accessibility*: url, publisher - *Specificity* : version, environment # metadata landscape (one decomposed slide) #+INCLUDE: "../../common/modules/metadata-landscape.org::#main" :only-contents t :minlevel 2 ** Software Metadata Terms \hfill [[https://codemeta.github.io/terms/][CodeMeta]] *** identify :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.4 :BEAMER_env: block :END: - identifier - name - author(s) - contributor(s) - version - applicationCategory - codeRepository #+BEAMER: \pause *** administrate :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_env: block :END: - maintainer (contact*) - citation - funder(s) - license - editor / publisher - dates (created, modified, published) - developmentStatus ** Software Metadata Terms \hfill [[https://codemeta.github.io/terms/][CodeMeta]] *** execute :B_block:BMCOL: :PROPERTIES: :BEAMER_opt: :BEAMER_env: block :BEAMER_col: 0.45 :END: - buildInstructions - issueTracker - operatingSystem - softwareRequirements - runtimePlatform - downloadUrl - (memory, procesor, storage) #+BEAMER: \pause *** classify :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.45 :BEAMER_env: block :END: - description - keywords - supportingData - referencePublication - algorithms* - readme (docs*) ** The CodeMeta Generator \hfill [[https://codemeta.github.io/codemeta-generator/][link to online tool]] #+latex: \begin{center} #+ATTR_LATEX: :width .7\linewidth file:CodeMetaGenerator.png #+latex: \end{center} * Conclusion ** Come in, we're open! *** Software Heritage :B_block:BMCOL: :PROPERTIES: :BEAMER_opt: :BEAMER_env: block :BEAMER_col: 0.4 :END: - universal source code archive - intrinsic identifiers (SWHIDs) - open, non profit, long term - infrastructure for Open Science #+BEAMER: \pause *** You can help improve science! :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_env: block :END: - use SWH and save /relevant/ source code - build on SWH (see swmath.org and ipol.im) - contribute to SWH- it is /open source/ - spread the word ** #+BEGIN_EXPORT latex \begin{center} \includegraphics[width=.5\linewidth]{SWH-logo.pdf} \end{center} \begin{center} \vfill {\Large Thank you! Any questions?} \end{center} #+END_EXPORT *** contact: david.douard@softwareheritage.org :B_block: :PROPERTIES: :BEAMER_env: block :END: #+BEGIN_EXPORT latex \begin{thebibliography}{Foo Bar, 1969} \footnotesize \bibitem{DiCosmo2020b} P. Alliez, R. Di Cosmo, B. Guedj, A. Girault, M. Hacid, A. Legrand, N. Rougier \newblock Attributing and Referencing (Research) Software: Best Practices and Outlook From Inria \newblock Computing in Science \& Engineering, 22 (1), pp. 39-52, 2020, ISSN: 1558-366X \bibitem{DiCosmo2020a} Roberto Di Cosmo, Morane Gruenpeter, Stefano Zacchiroli \newblock Referencing Source Code Artifacts: a Separate Concern in Software Citation \newblock Computing in Science \& Engineering, 2020, ISSN: 1521-9615 \end{thebibliography} #+END_EXPORT