diff --git a/talks-public/2021-10-19-telecom-idia/2021-10-19-telecom-idia.org b/talks-public/2021-10-19-telecom-idia/2021-10-19-telecom-idia.org index 37fca92..98d5d7b 100644 --- a/talks-public/2021-10-19-telecom-idia/2021-10-19-telecom-idia.org +++ b/talks-public/2021-10-19-telecom-idia/2021-10-19-telecom-idia.org @@ -1,331 +1,331 @@ #+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt) #+TITLE: Software Heritage #+SUBTITLE: A research platform for large-scale source code archival -#+BEAMER_HEADER: \date[2021-10-19, IDIA]{19 Oct 2021\\IDIA Prototype/Software/Platform Day\\Télécom Paris} +#+BEAMER_HEADER: \date[2021-10-19, IDIA]{19 Oct 2021\\IDIA Prototype/Software/Platform Day\\Télécom Paris\\[-2ex]} #+AUTHOR: Stefano Zacchiroli #+DATE: 19 Oct 2021 #+EMAIL: zack@upsilon.cc #+INCLUDE: "../../common/modules/prelude-toc.org" :minlevel 1 #+INCLUDE: "../../common/modules/169.org" #+BEAMER_HEADER: \institute[Télécom Paris]{Télécom Paris --- {\tt zack@upsilon.cc, @zacchiro}} #+BEAMER_HEADER: \author{Stefano Zacchiroli} * About me :B_ignoreheading: :PROPERTIES: :BEAMER_env: ignoreheading :END: #+INCLUDE: "this/zack.org" :minlevel 2 * Why we must preserve the history of software source code ** Software /source code/ is precious knowledge #+INCLUDE: "../../common/modules/source-code-different-short.org::#softwareisdifferent" :only-contents t :minlevel 3 ** Calling for source code preservation: UNESCO *** :B_column:BMCOL: :PROPERTIES: :BEAMER_col: .53 :BEAMER_env: column :END: #+ATTR_LATEX: :width .7\linewidth file:UNESCOParisCallMeeting.png UNESCO, Inria, Software Heritage invite\\ [[https://en.unesco.org/news/experts-call-greater-recognition-software-source-code-heritage-sustainable-development][40 international experts meet in Paris]] ... #+BEAMER: \pause *** :B_column:BMCOL: :PROPERTIES: :BEAMER_col: .5 :BEAMER_env: column :END: #+ATTR_LATEX: :width .65\linewidth file:paris_call_ssc_cover.jpg [[https://en.unesco.org/foss/paris-call-software-source-code][The call is published on Feb 2019]]\pause *** :B_ignoreheading: :PROPERTIES: :BEAMER_env: ignoreheading :END: *** :PROPERTIES: :BEAMER_COL: 1.06 :BEAMER_env: block :END: “[We call to] support efforts to gather and preserve the artifacts and narratives of the history of computing, while the earlier creators are still alive” https://en.unesco.org/foss/paris-call-software-source-code ** Source code history --- for open science #+INCLUDE: "../../common/modules/swh-ardc.org::#pillaropenscience" :only-contents t :minlevel 3 *** \hfill Preserving the history of source code is important for /reproducibility/ ** Source code history --- for security and transparency #+LATEX: \vspace{-.5em} *** Where does reused software come from? :B_block: :PROPERTIES: :BEAMER_env: block :BEAMER_COL: .5 :END: #+BEGIN_EXPORT latex \begin{center} \includegraphics[width=.7\linewidth]{myriadsources} \end{center} #+END_EXPORT #+BEAMER: \pause *** Do /you/ know where it comes from? :B_block: :PROPERTIES: :BEAMER_env: block :BEAMER_COL: .4 :END: - the software you ship - the software you use - the software you acquire - the software that + has that bug + has that vulnerability *** :B_ignoreheading: :PROPERTIES: :BEAMER_env: ignoreheading :END: #+BEAMER: \pause *** KYSW: Know Your SoftWare :B_picblock: :PROPERTIES: :BEAMER_env: picblock :BEAMER_OPT: pic=executiveorder.jpg,width=.4\linewidth,leftpic=true :END: Like KYC in banking, KYSW is now essential all over IT...\\ \mbox{}\\ *Sec. 4. Enhancing Software Supply Chain Security* \\ \hfill /ensuring and attesting, to the extent practicable, to the integrity and provenance of open source software/\\ \mbox{}\hfill [[https://www.whitehouse.gov/briefing-room/presidential-actions/2021/05/12/executive-order-on-improving-the-nations-cybersecurity/][May 2021 POTUS Executive Order]] ** Fragile #+INCLUDE: "../../common/modules/swh-motivations.org::#fragile" :only-contents t :minlevel 3 * How we can preserve our software heritage ** Software Heritage in a nutshell \hfill www.softwareheritage.org #+BEAMER: \transdissolve #+INCLUDE: "../../common/modules/swh-goals-oneslide-vertical.org::#goals" :only-contents t :minlevel 3 ** The largest public source code archive, principled \hfill \small \url{bit.ly/swhpaper} *** :PROPERTIES: :BEAMER_env: block :BEAMER_col: 0.5 :END: #+latex: \centering #+ATTR_LATEX: :width \linewidth file:SWH-as-foundation-slim.png *** :PROPERTIES: :BEAMER_env: block :BEAMER_col: 0.5 :END: #+latex: \centering #+ATTR_LATEX: :width \linewidth file:2021-09-archive-growth.png\\ [[https://archive.softwareheritage.org][archive.softwareheritage.org]] *** linebreak :B_ignoreheading: :PROPERTIES: :BEAMER_env: ignoreheading :END: #+BEAMER: \pause *** Technology :PROPERTIES: :BEAMER_env: block :BEAMER_col: 0.34 :END: - transparency and FOSS - replicas all the way down *** Content (billions!) :PROPERTIES: :BEAMER_env: block :BEAMER_col: 0.32 :END: - intrinsic identifiers - facts and provenance *** Organization :PROPERTIES: :BEAMER_env: block :BEAMER_col: 0.33 :END: - non-profit - multi-stakeholder ** A peek under the hood: a global view on the software commons #+BEAMER: \begin{center} - #+BEAMER: \mode{\only<1>{\includegraphics[width=\extblockscale{.9\textwidth}]{swh-dataflow-merkle-listers.pdf}}} - #+BEAMER: \only<2>{\includegraphics[width=\extblockscale{.9\textwidth}]{swh-dataflow-merkle.pdf}} + #+BEAMER: \mode{\only<1>{\includegraphics[width=\extblockscale{1.3\textwidth}]{swh-dataflow-merkle-listers.pdf}}} + #+BEAMER: \only<2->{\includegraphics[width=\extblockscale{.9\textwidth}]{swh-dataflow-merkle.pdf}} #+BEAMER: \end{center} #+BEAMER: \pause *** A *global graph* linking together fully *deduplicated* source code artifact - (files, commits, directories, releases, etc.) to the places3that distribute + (files, commits, directories, releases, etc.) to the places that distribute them (e.g., Git repositories), providing a *unified view* on the entire */Software Commons/*. Size: *~20 B* nodes, *~200 B* edges, *~600 TB* (uncompressed) blobs # - *GitHub*, Gitlab.com, Bitbucket, /Gitorious/, /GoogleCode/, GNU, PyPi, Debian, NPM... ** Software Heritage /intrinsic/ Identifiers (SWHID) \hfill [[https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html][(full spec)]] #+LATEX: \centering%\forcebeamerstart #+LATEX: \mode{\only<1>{\includegraphics[width=\linewidth]{SWHID-v1.4_1.png}}} #+LATEX: \mode{\only<2>{\includegraphics[width=\linewidth]{SWHID-v1.4_2.png}}} #+LATEX: \only<3->{\includegraphics[width=\linewidth]{SWHID-v1.4_3.png}} #+LATEX: %\forcebeamerend *** An emerging standard :B_block: :PROPERTIES: :BEAMER_act: <4-> :BEAMER_COL: .6 :BEAMER_env: block :END: - in Linux Foundation's [[https://spdx.github.io/spdx-spec/appendix-VI-external-repository-identifiers/#persistent-id][SPDX 2.2]] - IANA registered, WikiData property [[https://www.wikidata.org/wiki/Property:P6138][P6138]] *** Examples: :B_block: :PROPERTIES: :BEAMER_act: <5-> :BEAMER_COL: .4 :BEAMER_env: block :END: - [[https://archive.softwareheritage.org/swh:1:cnt:64582b78792cd6c2d67d35da5a11bb80886a6409;origin=https://github.com/virtualagc/virtualagc;lines=245-261/][Apollo 11 AGC excerpt]] - [[https://archive.softwareheritage.org/swh:1:cnt:bb0faf6919fc60636b2696f32ec9b3c2adb247fe;origin=https://github.com/id-Software/Quake-III-Arena;lines=549-572/][Quake III rsqrt]] * Demo time! ** A walkthrough - Browse [[https://archive.softwareheritage.org][the archive]] - [[https://save.softwareheritage.org][Trigger archival]] of your preferred software in a breeze - Get and use SWHIDs ([[https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html][full spec available online]]) - The [[https://www.softwareheritage.org/2019/07/20/archiving-and-referencing-the-apollo-source-code/][Apollo 11 AGC source code example]] - Cite software [[https://www.softwareheritage.org/2020/05/26/citing-software-with-style/][with the biblatex-software style]] from CTAN - Example use in a research article: compare Fig. 1 and conclusions - in [[http://www.dicosmo.org/Articles/2012-DaneluttoDiCosmo-Pcs.pdf][the 2012 version]] - in [[https://www.dicosmo.org/share/parmap_swh.pdf][the updated version]] using SWHIDs and Software Heritage # - Example use in a research article: extensive use of SWHIDs in [[https://www.dicosmo.org/Articles/2020-ReScienceC.pdf][a replication experiment]] - Example in a journal: [[http://www.ipol.im/pub/art/2020/300/][an article from IPOL]] - [[https://doc.archives-ouvertes.fr/en/deposit/deposit-software-source-code/][Curated deposit in SWH via HAL]], see for example: [[https://hal.archives-ouvertes.fr/hal-02130801][LinBox]], [[https://hal.archives-ouvertes.fr/hal-01897934][SLALOM]], [[https://hal.archives-ouvertes.fr/hal-02130729][Givaro]], [[https://hal.archives-ouvertes.fr/hal-02137040][NS2DDV]], [[https://hal.archives-ouvertes.fr/lirmm-02136558][SumGra]], [[https://hal.archives-ouvertes.fr/hal-02155786][Coq proof]], ... - Rescue landmark legacy software, see the [[https://www.softwareheritage.org/swhap/][SWHAP process with UNESCO]] * Scientific challenges ** A revolutionary research infrastructure designed for source code #+INCLUDE: "../../common/modules/swh-as-infrastructure.org::#oneslide" :only-contents t :minlevel 3 ** A challenging scientific and technical undertaking *** A novel, large infrastructure - gigantic Merkle graph - object storage [[https://www.softwareheritage.org/2021/03/11/towards-a-next-generation-object-storage-for-software-heritage/][with peculiar workload]] - simple problems become hard, e.g., counting tens of billions of objects, or sorting all possible origins of a node # - and much more: see [[https://www.softwareheritage.org/2021/04/08/swh-2021-technical-roadmap/][the 2021 technical roadmap]] #+BEAMER: \pause \vspace{-1mm} *** First dataset available as open data #+BEGIN_EXPORT latex \begin{thebibliography}{Foo Bar, 1969} \footnotesize \bibitem{Pietri2019} Antoine Pietri, Diomidis Spinellis, Stefano Zacchiroli\newblock The Software Heritage Graph Dataset: Public software development under one roof\newblock MSR 2019: 16th Intl. Conf. on Mining Software Repositories. IEEE\newblock preprint: \url{http://deb.li/swhmsr19} \end{thebibliography} \vspace{-1mm} #+END_EXPORT - used as topic for the MSR 2020 mining competition #+BEAMER: \pause \vspace{-1mm} *** → for more, see last week's talk with the DIG team: *Analyzing the Global Graph of Public Software Development* [[https://upsilon.cc/~zack/talks/2021/2021-10-14-telecom-dig.pdf][upsilon.cc/~zack/talks/2021/2021-10-14-telecom-dig.pdf]] * Preserving our software commons: the present and the future ** Focus on Academia: growing adoption (selection) #+INCLUDE: "../../common/modules/swh-adoption-academic.org::#adoption" :only-contents t :minlevel 3 ** An international, non profit initiative\hfill built for the long term :PROPERTIES: :CUSTOM_ID: support :END: *** Sharing the vision :B_block: :PROPERTIES: :CUSTOM_ID: endorsement :BEAMER_COL: .5 :BEAMER_env: block :END: #+LATEX: \begin{center}{\includegraphics[width=\extblockscale{.4\linewidth}]{unesco_logo_en_285}}\end{center} #+LATEX: \vspace{-0.8cm} #+LATEX: \begin{center}\vskip 1em \includegraphics[width=\extblockscale{1.4\linewidth}]{support.pdf}\end{center} #+latex: \small And many more ...\\ #+latex:\mbox{}~~~~~~~\tiny\url{www.softwareheritage.org/support/testimonials} #+BEAMER: \pause *** Donors, members, sponsors :B_block: :PROPERTIES: :CUSTOM_ID: sponsors :BEAMER_COL: .5 :BEAMER_env: block :END: #+LATEX: \begin{center}\includegraphics[width=\extblockscale{.4\linewidth}]{inria-logo-new}\end{center} #+LATEX: \begin{center} # #+LATEX: \includegraphics[width=\extblockscale{.2\linewidth}]{sponsors-levels.pdf} #+LATEX: \colorbox{white}{\includegraphics[width=\extblockscale{1.4\linewidth}]{sponsors.pdf}} #+LATEX: \end{center} # - sponsoring / partnership :: \hfill \url{sponsorship.softwareheritage.org} *** :B_ignoreheading: :PROPERTIES: :BEAMER_env: ignoreheading :END: *** Research collaboration :B_picblock:noexport: :PROPERTIES: :BEAMER_COL: .5 :BEAMER_env: picblock :BEAMER_OPT: pic=Qwant_Logo, leftpic=true :END: source code search engine *** See more :noexport: \hfill\tiny\url{http:://www.softwareheritage.org/support/testimonials} *** Global network :B_picblock:noexport: :PROPERTIES: :BEAMER_COL: .5 :BEAMER_env: picblock :BEAMER_OPT: pic=fossid, leftpic=true, width=.3\linewidth :END: - first *independent mirror* - increased reliability ** You may help! *** Foster the adoption of research best practices - [[https://www.softwareheritage.org/save-and-reference-research-software/][archive and reference relevant source code]] (save code now, and [[https://hal.inria.fr/hal-01872189][deposit]]) - use Software Heritage and [[https://www.softwareheritage.org/2020/05/26/citing-software-with-style/][biblatex-software]] in articles, journals, and books - [[https://www.softwareheritage.org/swhap/][rescue and preserve landmark legacy source code]] with SWHAP #+BEAMER: \pause *** Engage with Software Heritage as a researcher - use the archive for your own software-related experiments - work with us to tackle open technical and research problems #+BEAMER: \pause *** Engage with Software Heritage as an organization - become [[https://www.softwareheritage.org/support/sponsors/][a member/sponsor]] - build a Software Heritage mirror - contribute to the preservation mission ** Thank you! #+BEAMER: \vspace{-1mm} *** Resources - archive :: [[https://archive.softwareheritage.org/][archive.softwareheritage.org]] - stay posted :: [[https://www.softwareheritage.org/newsletter/][softwareheritage.org/newsletter]] - blog :: [[https://www.softwareheritage.org/blog/][softwareheritage.org/blog]] #+BEAMER: \vspace{-2mm} *** References (selected; full list at [[https://www.softwareheritage.org/publications][softwareheritage.org/publications]]) #+BEGIN_EXPORT latex \begin{thebibliography}{Foo Bar, 1969} \scriptsize \vspace{-2mm} % \bibitem{DiCosmo2017} Roberto Di Cosmo, Stefano Zacchiroli % \newblock Software Heritage: Why and How to Preserve Software Source Code % \newblock iPRES 2017: Intl. Conf. on Digital Preservation \bibitem{Abramatic2018} Jean-François Abramatic, Roberto Di Cosmo, Stefano Zacchiroli \newblock Building the Universal Archive of Source Code \newblock Communication of the ACM, October 2018 \bibitem{Pietri2020c} Antoine Pietri, Diomidis Spinellis, Stefano Zacchiroli \newblock The Software Heritage Graph Dataset: Large-scale Analysis of Public Software Development History \newblock MSR 2020: 17th Intl. Conf. on Mining Software Repositories. IEEE \bibitem{DiCosmo2020d} Roberto Di Cosmo \newblock Archiving and Referencing Source Code with Software Heritage \newblock International Congress on Mathematical Software (ICMS), 2020 \bibitem{PNSO2} MESRI \newblock Second French Plan for Open Science \newblock \href{https://www.ouvrirlascience.fr/second-national-plan-for-open-science}{www.ouvrirlascience.fr/second-national-plan-for-open-science}, 2001 \end{thebibliography} #+END_EXPORT -* Appendix :B_appendix:noexport: +* Appendix :B_appendix: :PROPERTIES: :BEAMER_env: appendix :END: -** - \vfill - \centerline{\Huge Appendix} - \vfill + + #+INCLUDE: "../../common/modules/status-extended.org::#archivinggoals" :minlevel 2 + #+INCLUDE: "../../common/modules/data-model.org::#merklestruct" :minlevel 2 + #+INCLUDE: "../../common/modules/status-extended.org::#dagdetailsmall" :minlevel 2 :only-contents t