Page MenuHomeSoftware Heritage

No OneTemporary

diff --git a/talks-public/2022-09-16-Guix/2022-06-24-Guix.org b/talks-public/2022-09-16-Guix/2022-09-11.org
similarity index 88%
rename from talks-public/2022-09-16-Guix/2022-06-24-Guix.org
rename to talks-public/2022-09-16-Guix/2022-09-11.org
index ea6e383..9626c29 100644
--- a/talks-public/2022-09-16-Guix/2022-06-24-Guix.org
+++ b/talks-public/2022-09-16-Guix/2022-09-11.org
@@ -1,356 +1,365 @@
#+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt)
#+KEYWORDS: software heritage reproducibility guix
#+TITLE: 10 years of Guix
#+SUBTITLE: Software Heritage to the rescue of reproducible Science
#+AUTHOR: vlorentz, ardumont
#+EMAIL: vlorentz@softwareheritage.org, ardumont@softwareheritage.org
#+DATE: 16 Sep 2022
#+BEAMER_HEADER: \date[16/09/2022]{16/09/2022\\Event 10 years of Guix, Paris 2022}
#+BEAMER_HEADER: \author{Valentin Lorentz (@vlorentz) / Antoine R. Dumont (@ardumont)}
#+BEAMER_HEADER: \institute[Software Heritage]{Software Engineers, Software Heritage\\Inria}
#+BEAMER_HEADER: \setbeameroption{hide notes}
#+LATEX_HEADER: \usepackage{tcolorbox}
#+LATEX_HEADER: \definecolor{links}{HTML}{2A1B81}
#+LATEX_HEADER: \hypersetup{colorlinks,linkcolor=,urlcolor=links}
# Syntax highlighting setup
#+LATEX_HEADER_EXTRA: \usepackage{minted}
#+LaTeX_HEADER_EXTRA: \usemintedstyle{emacs}
#+name: setup-minted
#+begin_src emacs-lisp :exports results :results silent
(setq org-latex-listings 'minted)
(setq org-latex-to-pdf-process
'("pdflatex -shell-escape -interaction nonstopmode -output-directory %o %f"
"pdflatex -shell-escape -interaction nonstopmode -output-directory %o %f"
"pdflatex -shell-escape -interaction nonstopmode -output-directory %o %f"))
(add-to-list 'org-latex-minted-langs '("emacs-lisp" "common-lisp"))
#+end_src
# End syntax highlighting setup
#
# prelude.org contains all the information needed to export the main beamer latex source
# use prelude-toc.org to get the table of contents
#
#+INCLUDE: "../../common/modules/prelude-toc.org" :minlevel 1
#+INCLUDE: "../../common/modules/169.org"
# +LaTeX_CLASS_OPTIONS: [aspectratio=169,handout,xcolor=table]
#+LATEX_HEADER: \usepackage{bbding}
#+LATEX_HEADER: \DeclareUnicodeCharacter{66D}{\FiveStar}
#
# If you want to change the title logo it's here
#
# +BEAMER_HEADER: \titlegraphic{\includegraphics[width=0.5\textwidth]{SWH-logo}}
# aspect ratio can be changed, but the slides need to be adapted
# - compute a "resizing factor" for the images (macro for picblocks?)
#
# set the background image
#
# https://pacoup.com/2011/06/12/list-of-true-169-resolutions/
#
#+BEAMER_HEADER: \pgfdeclareimage[height=90mm,width=160mm]{bgd}{swh-world-169.png}
#+BEAMER_HEADER: \setbeamertemplate{background}{\pgfuseimage{bgd}}
#+LATEX: \addtocounter{framenumber}{-1}
* Introduction: the Software Heritage project
** What is SoftwareHeritage?
:PROPERTIES:
:CUSTOM_ID: spread
:END:
#+latex: \begin{center}
#+ATTR_LATEX: :width \extblockscale{.6\linewidth}
file:SWH-logo+motto.pdf
#+latex: \end{center}
The Universal Source Code Archive
** Why an archive? Software is spread all around
:PROPERTIES:
:CUSTOM_ID: spread
:END:
#+latex: \begin{flushleft}
#+ATTR_LATEX: :width \extblockscale{.5\linewidth}
file:myriadsources.png
#+latex: \end{flushleft}
*** Fashion victims
- - disparate development platforms (popular forges: Guix, PyPI, npm, ...)
- - various places where distribution happens (standalone forges: gitlab, heptapod, cgit, gitea...)
+ - many development platforms (popular forges: Guix, PyPI, npm, ...)
+ - various distribution places (standalone forges: gitlab, heptapod, cgit, gitea...)
- projects tend to migrate from one place to another over time
*** One place... :B_block:
:PROPERTIES:
:BEAMER_env: block
:END:
\hfill ... where can we find, track and search /all/ source code, rebuild tarballs?
** Why an archive? Software is fragile
:PROPERTIES:
:CUSTOM_ID: fragile
:END:
#+latex: \begin{flushleft}
#+ATTR_LATEX: :width \extblockscale{.5\linewidth}
file:fragilecloud.png
#+latex: \end{flushleft}
*** Like all digital information, FOSS is fragile
# - inconsiderate and/or malicious code loss (e.g., Code Spaces)
- link rot: projects are created, moved around, removed
- data rot: physical media with legacy software decay
- business-driven code loss (e.g. Gitorious, Google Code, Bitbucket, ...)
*** If a website disappears you go to the Internet Archive... :B_block:
:PROPERTIES:
:BEAMER_env: block
:END:
\hfill where do you go if (a repository on) GitHub or GitLab goes away?
** Software Heritage in a Nutshell
#+latex: \begin{center}
#+ATTR_LATEX: :width \extblockscale{.6\linewidth}
file:SWH-logo+motto.pdf
#+latex: \end{center}
*** Main Objectives
- *Collect*, *Preserve* and *Share*
** Collect
*** Reference catalog
:PROPERTIES:
:BEAMER_env: block
:BEAMER_COL: .3
:END:
#+BEGIN_EXPORT latex
\begin{center}
\includegraphics[width=.4\linewidth]{myriadsources}
\end{center}
#+END_EXPORT
*find* and *reference* all software source code
** Preserve
*** Universal archive
:PROPERTIES:
:BEAMER_env: block
:BEAMER_COL: .3
:END:
#+BEGIN_EXPORT latex
\begin{center}
\includegraphics[width=.4\linewidth]{fragilecloud}
\end{center}
#+END_EXPORT
*preserve* all archived software source code *forever*
** Share
*** Research infrastructure :B_block:
:PROPERTIES:
:BEAMER_COL: .3
:BEAMER_env: block
:END:
#+BEGIN_EXPORT latex
\begin{center}
\includegraphics[width=.4\linewidth]{atacama-telescope}
\end{center}
#+END_EXPORT
*enable analysis* of all software source code, make every piece *identifiable* and
freely *available*...
*** Reproducibility :B_block:
:PROPERTIES:
:BEAMER_COL: .3
:BEAMER_env: block
:END:
#+BEGIN_EXPORT latex
\begin{center}
\includegraphics[width=.4\linewidth]{atacama-telescope}
\end{center}
#+END_EXPORT
... as much as possible *exactly* as it was when archived
-** Reproducibility
-
-# reproducibility and scientific knowledge pillars (one slide)
-#+INCLUDE: "../../common/modules/swh-scientific-reproducibility.org::#main" :only-contents t :minlevel 2
-
** Our principles
:PROPERTIES:
:CUSTOM_ID: principlesstatus
:END:
#+latex: \begin{center}
#+ATTR_LATEX: :width .6\linewidth
file:SWH-as-foundation-slim.png
#+latex: \end{center}
#+latex: \footnotesize\vspace{-3mm}
#+latex: \centering
#+ATTR_LATEX: :width \extblockscale{.8\linewidth}
[[file:2022-09-14-archive-growth.png]]
** Under the hood: Automation, and storage
:PROPERTIES:
:CUSTOM_ID: automation
:END:
#+BEAMER: \begin{center}
#+BEAMER: \only<1>{\includegraphics[width=\extblockscale{\textwidth}]{swh-dataflow-merkle.pdf}}
#+BEAMER: \end{center}
/Global development history/ *permanently archived* in a *uniform data model*
- over *12 billion* unique source files from over *180 million* software projects
- *~900 TB* (uncompressed) blobs, *~25 B* nodes, *~300 B* edges
* Reference archived code with SWHIDs
** R(eference): granularity and identifiers \hfill [[http://doi.org/10.15497/RDA00053][10.15497/RDA00053]]
#+LATEX: \centering\forcebeamerstart
#+LATEX: \only<1>{\includegraphics[width=0.8\linewidth]{Granularity-Level-animated-0.png}}
#+LATEX: \only<2>{\includegraphics[width=0.8\linewidth]{Granularity-Level-animated-1.png}}
#+LATEX: \only<3>{\includegraphics[width=0.8\linewidth]{Granularity-Level-animated-2.png}}
#+LATEX: \only<4>{\includegraphics[width=0.8\linewidth]{Granularity-Level-animated-3.png}}
#+LATEX: \forcebeamerend
#+LATEX: \only<1>{\begin{block}{}\centering Top concept layers vs. bottom artifact layers\end{block}}
#+LATEX: \only<2>{\begin{block}{}\centering Extrinsic identifiers are key for the concept layers\end{block}}
#+LATEX: \only<3>{\begin{block}{}\centering Intrinsic identifiers are key for the artifact layers\end{block}}
#+LATEX: \only<4>{\begin{block}{}\centering In some cases, extrinsic identifiers can be added too\end{block}}
** Meet the SWHID intrinsic identifiers
:PROPERTIES:
:CUSTOM_ID: oneslide
:END:
#+LATEX: \centering
#+LATEX: \only<1>{\includegraphics[width=\linewidth]{SWHID-v1.4_3.png}}
#+LATEX: \forcebeamerend \vspace{-6mm}
+** SWHID: A worked example
+ #+LATEX: \centering\forcebeamerstart
+ #+LATEX: \only<1>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_1.pdf}}}
+ #+LATEX: \only<2>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/contents.pdf}}}
+ #+LATEX: \only<3>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_2_contents.pdf}}}
+ #+LATEX: \only<4>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/directories.pdf}}}
+ #+LATEX: \only<5>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_3_directories.pdf}}}
+ #+LATEX: \only<6>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/revisions.pdf}}}
+ #+LATEX: \only<7>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_4_revisions.pdf}}}
+ #+LATEX: \only<8>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/releases.pdf}}}
+ #+LATEX: \only<9>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/merkle_5_releases.pdf}}}
+ #+LATEX: \only<10>{\colorbox{white}{\includegraphics[width=\extblockscale{\linewidth}]{git-merkle/snapshots.pdf}}}
+ #+LATEX: \forcebeamerend
+
* Guix
** How does this relate to Guix?
- Nothing is eternal, source code (in all forms) disappears
- Hopefully, SWH keeps a copy of everything
- Since November 2018, [[https://www.softwareheritage.org/2019/04/18/software-heritage-and-gnu-guix-join-forces-to-enable-long-term-reproducibility/][Guix ensures source code is archived in SWH when building]]
- After source code actually disappears, falls back to SWH when rebuilding
** Reproducibility is of the essence!
*** Report
- Tarballs will disappear (give it enough time)
- Persistent intrinsic identifiers (SWHID) are not (yet?) package manager standard
- Guix (and other) package managers reference tarball hashes
#+begin_src emacs-lisp
(define-public ...
(package
...
(source (origin (method url-fetch)
(uri (string-append
"https://..." version ".tar.gz"))
(sha256 (base32 "03mwi1l3354x52nar..."))))
...
#+end_src
*** Conclusion
- make (non-specific swh) SWHID standard or rebuild original bit-by-bit tarball
* Enters... Disarchive
** How it started
*** Discussions
- "gforge.inria.fr to be taken off-line in Dec. 2020" https://issues.guix.gnu.org/42162
- "lookup ingested tarballs by container checksum" https://forge.softwareheritage.org/T2430
*** New software
- Disarchive by Timothy Sample https://git.ngyro.com/disarchive/
** How it works:
- Manifest of tarball fields: entry order, PAX headers, ...
- References to individual file hashes
- WIP: guessing compression parameters/implementations (using zgz)
- -> rebuild original `.tar`, then original `.tar.{gz,xz}`
** Example manifest (1/2)
#+begin_src emacs-lisp
(disarchive
(version 0)
(tarball
(name "test-archive.tar")
(digest (sha256 "0da9fa3e7b360533678338871d9dd36f3..."))
(default-header
(chksum (trailer " "))
(magic "ustar ")
(version " \x00")
(devmajor 0 (source "" (trailer "")))
(devminor 0 (source "" (trailer "")))
(data-padding ""))
...
#+end_src
** Example manifest (2/2)
#+begin_src emacs-lisp
(disarchive
...
(headers
("test-archive/" (mode 493) (chksum 4291) (typeflag 53))
("test-archive/file-a" (size 15) (chksum 4849))
("test-archive/file-b" (size 15) (chksum 4850)))
(padding 6656)
(input (directory-ref
(version 0)
(name "test-archive")
(addresses
(swhid "swh:1:dir:902b1e94f0f5efdde6..."))
(digest (sha256 "277decb2666f4832ef64a...")))))
#+end_src
** Planned integration of SWH with Disarchive
*** Currently
- SWH does not store Disarchive manifests yet
*** Plan
- Run Disarchive every time SWH loads a tarball
- Store it as `(tarball-hash, directory-hash, manifest)` tuples
- when someone requests `tarball-hash`, rebuild from the manifest
* Current Work in Progress
** NixGuix manifests coverage in SWH
*** goal: 100% coverage
- currently missing sources due to technical limitations: bare files, directories, patches
- Redesign in progress to deal with such limitations
#+latex: \centering
#+ATTR_LATEX: :width \extblockscale{.8\linewidth}
file:ngyro-com-pog-reports-guix-coverage-2022-09-14.png
** Disarchive
*** Integration
- code dump at https://git.ngyro.com/swh/
- needs to be reviewed and merged
* Questions?
And thanks for your time!
* Copyright
Copyright of images included in this document is held by their respective owners.
The source of this document is available at https://forge.softwareheritage.org/source/slides/

File Metadata

Mime Type
text/x-diff
Expires
Sat, Jun 21, 8:52 PM (3 w, 6 d ago)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3447419

Event Timeline