diff --git a/talks-public/2022-09-16-Guix/2022-06-24-Guix.org b/talks-public/2022-09-16-Guix/2022-06-24-Guix.org new file mode 100644 index 0000000..06bebb7 --- /dev/null +++ b/talks-public/2022-09-16-Guix/2022-06-24-Guix.org @@ -0,0 +1,306 @@ +#+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt) +#+KEYWORDS: software heritage reproducibility guix +#+TITLE: 10 years of Guix - Software Heritage +#+SUBTITLE: SWH to the rescue of reproducible Science +#+AUTHOR: vlorentz, ardumont +#+EMAIL: vlorentz@softwareheritage.org, ardumont@softwareheritage.org +#+DATE: 16 Sep 2022 +#+BEAMER_HEADER: \date[16/09/2022]{16/09/2022\\Event 10 years of Guix, Paris 2022} +# #+BEAMER_HEADER: \title[Archive and reference software~~~~ www.softwareheritage.org]{SWH to the rescue of reproducible Science} +#+BEAMER_HEADER: \author{Valentin Lorentz (@vlorentz) / Antoine R. Dumont (@ardumont)} +#+BEAMER_HEADER: \institute[Software Heritage]{Software Engineers, Software Heritage\\Inria} + +# #+BEAMER_HEADER: \setbeameroption{show notes on second screen} +#+BEAMER_HEADER: \setbeameroption{hide notes} +#+LATEX_HEADER: \usepackage{tcolorbox} +#+LATEX_HEADER: \definecolor{links}{HTML}{2A1B81} +#+LATEX_HEADER: \hypersetup{colorlinks,linkcolor=,urlcolor=links} + +# +# prelude.org contains all the information needed to export the main beamer latex source +# use prelude-toc.org to get the table of contents +# + +#+INCLUDE: "../../common/modules/prelude-toc.org" :minlevel 1 + +#+INCLUDE: "../../common/modules/169.org" + +# +LaTeX_CLASS_OPTIONS: [aspectratio=169,handout,xcolor=table] + +#+LATEX_HEADER: \usepackage{bbding} +#+LATEX_HEADER: \DeclareUnicodeCharacter{66D}{\FiveStar} + +# +# If you want to change the title logo it's here +# +# +BEAMER_HEADER: \titlegraphic{\includegraphics[width=0.5\textwidth]{SWH-logo}} + +# aspect ratio can be changed, but the slides need to be adapted +# - compute a "resizing factor" for the images (macro for picblocks?) +# +# set the background image +# +# https://pacoup.com/2011/06/12/list-of-true-169-resolutions/ +# +#+BEAMER_HEADER: \pgfdeclareimage[height=90mm,width=160mm]{bgd}{swh-world-169.png} +#+BEAMER_HEADER: \setbeamertemplate{background}{\pgfuseimage{bgd}} +#+LATEX: \addtocounter{framenumber}{-1} + + +* Introduction: the Software Heritage project + +** What is SoftwareHeritage? + :PROPERTIES: + :CUSTOM_ID: spread + :END: + + The universal source code Archive + +** Why an archive? Software is spread all around + :PROPERTIES: + :CUSTOM_ID: spread + :END: + #+latex: \begin{flushleft} + #+ATTR_LATEX: :width \extblockscale{.5\linewidth} + file:myriadsources.png + #+latex: \end{flushleft} + +*** Fashion victims + - disparate development platforms (popular forges: Guix, PyPI, npm, ...) + - various places where distribution happens (standalone forges: gitlab, heptapod, cgit, gitea...) + - projects tend to migrate from one place to another over time + +*** One place... :B_block: + :PROPERTIES: + :BEAMER_env: block + :END: + \hfill ... where can we find, track and search /all/ source code, rebuild tarballs? + +** Why an archive? Software is fragile + :PROPERTIES: + :CUSTOM_ID: fragile + :END: + #+latex: \begin{flushleft} + #+ATTR_LATEX: :width \extblockscale{.5\linewidth} + file:fragilecloud.png + #+latex: \end{flushleft} + +*** Like all digital information, FOSS is fragile +# - inconsiderate and/or malicious code loss (e.g., Code Spaces) + - link rot: projects are created, moved around, removed + - business-driven code loss (e.g., Gitorious, Google Code, Bitbucket, ...) + - data rot: physical media with legacy software decay + +*** If a website disappears you go to the Internet Archive... :B_block: + :PROPERTIES: + :BEAMER_env: block + :END: + \hfill where do you go if (a repository on) GitHub or GitLab goes away? + +** Software Heritage in a Nutshell + +#+latex: \begin{center} +#+ATTR_LATEX: :width \extblockscale{.6\linewidth} +file:SWH-logo+motto.pdf +#+latex: \end{center} + + +*** Main Objectives +- *Collect*, *Preserve* and *Share* + +*** Reference catalog + :PROPERTIES: + :BEAMER_env: block + :BEAMER_COL: .3 + :END: +#+BEGIN_EXPORT latex +\begin{center} +\includegraphics[width=.4\linewidth]{myriadsources} +\end{center} +#+END_EXPORT + *find* and *reference* all software source code + +*** Universal archive + :PROPERTIES: + :BEAMER_env: block + :BEAMER_COL: .3 + :END: +#+BEGIN_EXPORT latex +\begin{center} +\includegraphics[width=.4\linewidth]{fragilecloud} +\end{center} +#+END_EXPORT + *preserve* all the archived software source code *forever* + +*** Research infrastructure :B_block: + :PROPERTIES: + :BEAMER_COL: .3 + :BEAMER_env: block + :END: +#+BEGIN_EXPORT latex +\begin{center} +\includegraphics[width=.4\linewidth]{atacama-telescope} +\end{center} +#+END_EXPORT + *enable analysis* of all software source code, make every piece *identifiable* and freely *available* + +** Our principles + :PROPERTIES: + :CUSTOM_ID: principlesstatus + :END: +#+latex: \begin{center} +#+ATTR_LATEX: :width .6\linewidth +file:SWH-as-foundation-slim.png +#+latex: \end{center} +#+latex: \footnotesize\vspace{-3mm} + + #+latex: \centering + #+ATTR_LATEX: :width \extblockscale{.8\linewidth} + file:2022-05-06-archive-growth.png + +** Under the hood: Automation, and storage + :PROPERTIES: + :CUSTOM_ID: automation + :END: + #+BEAMER: \begin{center} + #+BEAMER: \only<1>{\includegraphics[width=\extblockscale{\textwidth}]{swh-dataflow-merkle.pdf}} + #+BEAMER: \end{center} + /Global development history/ *permanently archived* in a *uniform data model* + - over *12 billion* unique source files from almost *180 million* software projects + - *~900 TB* (uncompressed) blobs, *~25 B* nodes, *~300 B* edges + +* Reference archived code with SWHIDs +** R(eference): granularity and identifiers \hfill [[http://doi.org/10.15497/RDA00053][10.15497/RDA00053]] + #+LATEX: \centering\forcebeamerstart + #+LATEX: \only<1>{\includegraphics[width=0.8\linewidth]{Granularity-Level-animated-0.png}} + #+LATEX: \only<2>{\includegraphics[width=0.8\linewidth]{Granularity-Level-animated-1.png}} + #+LATEX: \only<3>{\includegraphics[width=0.8\linewidth]{Granularity-Level-animated-2.png}} + #+LATEX: \only<4>{\includegraphics[width=0.8\linewidth]{Granularity-Level-animated-3.png}} + #+LATEX: \forcebeamerend + #+LATEX: \only<1>{\begin{block}{}\centering Top concept layers vs. bottom artifact layers\end{block}} + #+LATEX: \only<2>{\begin{block}{}\centering Extrinsic identifiers are key for the concept layers\end{block}} + #+LATEX: \only<3>{\begin{block}{}\centering Intrinsic identifiers are key for the artifact layers\end{block}} + #+LATEX: \only<4>{\begin{block}{}\centering In some cases, extrinsic identifiers can be added too\end{block}} + +** Meet the SWHID intrinsic identifiers + + :PROPERTIES: + :CUSTOM_ID: oneslide + :END: + #+LATEX: \centering + #+LATEX: \only<1>{\includegraphics[width=\linewidth]{SWHID-v1.4_3.png}} + #+LATEX: \forcebeamerend \vspace{-6mm} + +** Meet the SWHID intrinsic identifiers + + \centering [[https://archive.softwareheritage.org/browse/origin/directory/?origin_url=https://src.koda.cnrs.fr/mmdc/sensorsio][SWHID DEMO !]] + + \vspace{1em} + + \centering [[https://www.softwareheritage.org/2020/07/09/intrinsic-vs-extrinsic-identifiers/][Reference : Extrinsic vs intrinsic identifiers]] + +* Guix +** How this relates to Guix? +- Nothing is eternal, source code (in all forms) disappears +- Hopefully, SWH keeps a copy of everything +- Guix ensures source code is archived in SWH when building +- After source code actually disappears, falls back to SWH when rebuilding +** Reproducibility is of the essence! + +*** Report + +- Tarballs will disappear (give it enough time) +- Persistent (intrinsic) identifier (SWHID) is not (yet?) package manager standard +- Guix (and other) package managers reference tarball hashes + +*** Conclusion + +- make (non-specific swh) SWHID standard or rebuild original bit-by-bit tarball + +#+begin_src scheme +(define-public ... + (package + ... + (source (origin (method url-fetch) + (uri (string-append "https://..." version ".tar.gz")) + (sha256 (base32 "03mwi1l3354x52nar...")))) + ... +#+end_src + +* Enters... Disarchive + +** How it started + +*** Discussions + - https://issues.guix.gnu.org/42162 + - https://forge.softwareheritage.org/T2430 + +*** New software + - Disarchive https://git.ngyro.com/disarchive/ + +** How it works: wip + + - Manifest of tarball fields (entry order, PAX headers, ...) + - References to individual file hashes + - WIP: guessing compression parameters/implementations (using zgz) + - -> rebuild original `.tar`, then original `.tar.{gz,xz}` + +** Example manifest (1/2) + +#+begin_src scheme +(disarchive + (version 0) + (tarball + (name "test-archive.tar") + (digest (sha256 "0da9fa3e7b360533678338871d9dd36f3b7ded...")) + (default-header + (chksum (trailer " ")) + (magic "ustar ") + (version " \x00") + (devmajor 0 (source "" (trailer ""))) + (devminor 0 (source "" (trailer ""))) + (data-padding "")) +... +#+end_src + +** Example manifest (2/2) + +#+begin_src scheme +(disarchive +... + (headers + ("test-archive/" (mode 493) (chksum 4291) (typeflag 53)) + ("test-archive/file-a" (size 15) (chksum 4849)) + ("test-archive/file-b" (size 15) (chksum 4850))) + (padding 6656) + (input (directory-ref + (version 0) + (name "test-archive") + (addresses + (swhid "swh:1:dir:902b1e94f0f5efdde63273c...")) + (digest (sha256 "277decb2666f4832ef64ae4655..."))))) +#+end_src + +** Planned integration of SWH with Disarchive + +*** Currently + + - SWH does not store Disarchive manifests yet + +*** Plan + + - Run Disarchive every time SWH loads a tarball + - Store it as `(tarball-hash, directory-hash, manifest)` tuples + - when someone requests `tarball-hash`, rebuild from the manifest + +* Current Work in Progress + +** NixGuix Coverage in SWH + + - It's missing sources due to technical limitations (files, directories, ...) + - Refactoring in progress to deal with such limitations + +** Disarchive + + - code dump at https://git.ngyro.com/swh/ + - needs to be reviewed and merged diff --git a/talks-public/2022-09-16-Guix/Makefile b/talks-public/2022-09-16-Guix/Makefile new file mode 100644 index 0000000..68fbee7 --- /dev/null +++ b/talks-public/2022-09-16-Guix/Makefile @@ -0,0 +1 @@ +include ../Makefile.slides