Page MenuHomeSoftware Heritage

No OneTemporary

diff --git a/talks-public/2022-09-16-Guix/2022-06-24-Guix.org b/talks-public/2022-09-16-Guix/2022-06-24-Guix.org
new file mode 100644
index 0000000..06bebb7
--- /dev/null
+++ b/talks-public/2022-09-16-Guix/2022-06-24-Guix.org
@@ -0,0 +1,306 @@
+#+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt)
+#+KEYWORDS: software heritage reproducibility guix
+#+TITLE: 10 years of Guix - Software Heritage
+#+SUBTITLE: SWH to the rescue of reproducible Science
+#+AUTHOR: vlorentz, ardumont
+#+EMAIL: vlorentz@softwareheritage.org, ardumont@softwareheritage.org
+#+DATE: 16 Sep 2022
+#+BEAMER_HEADER: \date[16/09/2022]{16/09/2022\\Event 10 years of Guix, Paris 2022}
+# #+BEAMER_HEADER: \title[Archive and reference software~~~~ www.softwareheritage.org]{SWH to the rescue of reproducible Science}
+#+BEAMER_HEADER: \author{Valentin Lorentz (@vlorentz) / Antoine R. Dumont (@ardumont)}
+#+BEAMER_HEADER: \institute[Software Heritage]{Software Engineers, Software Heritage\\Inria}
+
+# #+BEAMER_HEADER: \setbeameroption{show notes on second screen}
+#+BEAMER_HEADER: \setbeameroption{hide notes}
+#+LATEX_HEADER: \usepackage{tcolorbox}
+#+LATEX_HEADER: \definecolor{links}{HTML}{2A1B81}
+#+LATEX_HEADER: \hypersetup{colorlinks,linkcolor=,urlcolor=links}
+
+#
+# prelude.org contains all the information needed to export the main beamer latex source
+# use prelude-toc.org to get the table of contents
+#
+
+#+INCLUDE: "../../common/modules/prelude-toc.org" :minlevel 1
+
+#+INCLUDE: "../../common/modules/169.org"
+
+# +LaTeX_CLASS_OPTIONS: [aspectratio=169,handout,xcolor=table]
+
+#+LATEX_HEADER: \usepackage{bbding}
+#+LATEX_HEADER: \DeclareUnicodeCharacter{66D}{\FiveStar}
+
+#
+# If you want to change the title logo it's here
+#
+# +BEAMER_HEADER: \titlegraphic{\includegraphics[width=0.5\textwidth]{SWH-logo}}
+
+# aspect ratio can be changed, but the slides need to be adapted
+# - compute a "resizing factor" for the images (macro for picblocks?)
+#
+# set the background image
+#
+# https://pacoup.com/2011/06/12/list-of-true-169-resolutions/
+#
+#+BEAMER_HEADER: \pgfdeclareimage[height=90mm,width=160mm]{bgd}{swh-world-169.png}
+#+BEAMER_HEADER: \setbeamertemplate{background}{\pgfuseimage{bgd}}
+#+LATEX: \addtocounter{framenumber}{-1}
+
+
+* Introduction: the Software Heritage project
+
+** What is SoftwareHeritage?
+ :PROPERTIES:
+ :CUSTOM_ID: spread
+ :END:
+
+ The universal source code Archive
+
+** Why an archive? Software is spread all around
+ :PROPERTIES:
+ :CUSTOM_ID: spread
+ :END:
+ #+latex: \begin{flushleft}
+ #+ATTR_LATEX: :width \extblockscale{.5\linewidth}
+ file:myriadsources.png
+ #+latex: \end{flushleft}
+
+*** Fashion victims
+ - disparate development platforms (popular forges: Guix, PyPI, npm, ...)
+ - various places where distribution happens (standalone forges: gitlab, heptapod, cgit, gitea...)
+ - projects tend to migrate from one place to another over time
+
+*** One place... :B_block:
+ :PROPERTIES:
+ :BEAMER_env: block
+ :END:
+ \hfill ... where can we find, track and search /all/ source code, rebuild tarballs?
+
+** Why an archive? Software is fragile
+ :PROPERTIES:
+ :CUSTOM_ID: fragile
+ :END:
+ #+latex: \begin{flushleft}
+ #+ATTR_LATEX: :width \extblockscale{.5\linewidth}
+ file:fragilecloud.png
+ #+latex: \end{flushleft}
+
+*** Like all digital information, FOSS is fragile
+# - inconsiderate and/or malicious code loss (e.g., Code Spaces)
+ - link rot: projects are created, moved around, removed
+ - business-driven code loss (e.g., Gitorious, Google Code, Bitbucket, ...)
+ - data rot: physical media with legacy software decay
+
+*** If a website disappears you go to the Internet Archive... :B_block:
+ :PROPERTIES:
+ :BEAMER_env: block
+ :END:
+ \hfill where do you go if (a repository on) GitHub or GitLab goes away?
+
+** Software Heritage in a Nutshell
+
+#+latex: \begin{center}
+#+ATTR_LATEX: :width \extblockscale{.6\linewidth}
+file:SWH-logo+motto.pdf
+#+latex: \end{center}
+
+
+*** Main Objectives
+- *Collect*, *Preserve* and *Share*
+
+*** Reference catalog
+ :PROPERTIES:
+ :BEAMER_env: block
+ :BEAMER_COL: .3
+ :END:
+#+BEGIN_EXPORT latex
+\begin{center}
+\includegraphics[width=.4\linewidth]{myriadsources}
+\end{center}
+#+END_EXPORT
+ *find* and *reference* all software source code
+
+*** Universal archive
+ :PROPERTIES:
+ :BEAMER_env: block
+ :BEAMER_COL: .3
+ :END:
+#+BEGIN_EXPORT latex
+\begin{center}
+\includegraphics[width=.4\linewidth]{fragilecloud}
+\end{center}
+#+END_EXPORT
+ *preserve* all the archived software source code *forever*
+
+*** Research infrastructure :B_block:
+ :PROPERTIES:
+ :BEAMER_COL: .3
+ :BEAMER_env: block
+ :END:
+#+BEGIN_EXPORT latex
+\begin{center}
+\includegraphics[width=.4\linewidth]{atacama-telescope}
+\end{center}
+#+END_EXPORT
+ *enable analysis* of all software source code, make every piece *identifiable* and freely *available*
+
+** Our principles
+ :PROPERTIES:
+ :CUSTOM_ID: principlesstatus
+ :END:
+#+latex: \begin{center}
+#+ATTR_LATEX: :width .6\linewidth
+file:SWH-as-foundation-slim.png
+#+latex: \end{center}
+#+latex: \footnotesize\vspace{-3mm}
+
+ #+latex: \centering
+ #+ATTR_LATEX: :width \extblockscale{.8\linewidth}
+ file:2022-05-06-archive-growth.png
+
+** Under the hood: Automation, and storage
+ :PROPERTIES:
+ :CUSTOM_ID: automation
+ :END:
+ #+BEAMER: \begin{center}
+ #+BEAMER: \only<1>{\includegraphics[width=\extblockscale{\textwidth}]{swh-dataflow-merkle.pdf}}
+ #+BEAMER: \end{center}
+ /Global development history/ *permanently archived* in a *uniform data model*
+ - over *12 billion* unique source files from almost *180 million* software projects
+ - *~900 TB* (uncompressed) blobs, *~25 B* nodes, *~300 B* edges
+
+* Reference archived code with SWHIDs
+** R(eference): granularity and identifiers \hfill [[http://doi.org/10.15497/RDA00053][10.15497/RDA00053]]
+ #+LATEX: \centering\forcebeamerstart
+ #+LATEX: \only<1>{\includegraphics[width=0.8\linewidth]{Granularity-Level-animated-0.png}}
+ #+LATEX: \only<2>{\includegraphics[width=0.8\linewidth]{Granularity-Level-animated-1.png}}
+ #+LATEX: \only<3>{\includegraphics[width=0.8\linewidth]{Granularity-Level-animated-2.png}}
+ #+LATEX: \only<4>{\includegraphics[width=0.8\linewidth]{Granularity-Level-animated-3.png}}
+ #+LATEX: \forcebeamerend
+ #+LATEX: \only<1>{\begin{block}{}\centering Top concept layers vs. bottom artifact layers\end{block}}
+ #+LATEX: \only<2>{\begin{block}{}\centering Extrinsic identifiers are key for the concept layers\end{block}}
+ #+LATEX: \only<3>{\begin{block}{}\centering Intrinsic identifiers are key for the artifact layers\end{block}}
+ #+LATEX: \only<4>{\begin{block}{}\centering In some cases, extrinsic identifiers can be added too\end{block}}
+
+** Meet the SWHID intrinsic identifiers
+
+ :PROPERTIES:
+ :CUSTOM_ID: oneslide
+ :END:
+ #+LATEX: \centering
+ #+LATEX: \only<1>{\includegraphics[width=\linewidth]{SWHID-v1.4_3.png}}
+ #+LATEX: \forcebeamerend \vspace{-6mm}
+
+** Meet the SWHID intrinsic identifiers
+
+ \centering [[https://archive.softwareheritage.org/browse/origin/directory/?origin_url=https://src.koda.cnrs.fr/mmdc/sensorsio][SWHID DEMO !]]
+
+ \vspace{1em}
+
+ \centering [[https://www.softwareheritage.org/2020/07/09/intrinsic-vs-extrinsic-identifiers/][Reference : Extrinsic vs intrinsic identifiers]]
+
+* Guix
+** How this relates to Guix?
+- Nothing is eternal, source code (in all forms) disappears
+- Hopefully, SWH keeps a copy of everything
+- Guix ensures source code is archived in SWH when building
+- After source code actually disappears, falls back to SWH when rebuilding
+** Reproducibility is of the essence!
+
+*** Report
+
+- Tarballs will disappear (give it enough time)
+- Persistent (intrinsic) identifier (SWHID) is not (yet?) package manager standard
+- Guix (and other) package managers reference tarball hashes
+
+*** Conclusion
+
+- make (non-specific swh) SWHID standard or rebuild original bit-by-bit tarball
+
+#+begin_src scheme
+(define-public ...
+ (package
+ ...
+ (source (origin (method url-fetch)
+ (uri (string-append "https://..." version ".tar.gz"))
+ (sha256 (base32 "03mwi1l3354x52nar..."))))
+ ...
+#+end_src
+
+* Enters... Disarchive
+
+** How it started
+
+*** Discussions
+ - https://issues.guix.gnu.org/42162
+ - https://forge.softwareheritage.org/T2430
+
+*** New software
+ - Disarchive https://git.ngyro.com/disarchive/
+
+** How it works: wip
+
+ - Manifest of tarball fields (entry order, PAX headers, ...)
+ - References to individual file hashes
+ - WIP: guessing compression parameters/implementations (using zgz)
+ - -> rebuild original `.tar`, then original `.tar.{gz,xz}`
+
+** Example manifest (1/2)
+
+#+begin_src scheme
+(disarchive
+ (version 0)
+ (tarball
+ (name "test-archive.tar")
+ (digest (sha256 "0da9fa3e7b360533678338871d9dd36f3b7ded..."))
+ (default-header
+ (chksum (trailer " "))
+ (magic "ustar ")
+ (version " \x00")
+ (devmajor 0 (source "" (trailer "")))
+ (devminor 0 (source "" (trailer "")))
+ (data-padding ""))
+...
+#+end_src
+
+** Example manifest (2/2)
+
+#+begin_src scheme
+(disarchive
+...
+ (headers
+ ("test-archive/" (mode 493) (chksum 4291) (typeflag 53))
+ ("test-archive/file-a" (size 15) (chksum 4849))
+ ("test-archive/file-b" (size 15) (chksum 4850)))
+ (padding 6656)
+ (input (directory-ref
+ (version 0)
+ (name "test-archive")
+ (addresses
+ (swhid "swh:1:dir:902b1e94f0f5efdde63273c..."))
+ (digest (sha256 "277decb2666f4832ef64ae4655...")))))
+#+end_src
+
+** Planned integration of SWH with Disarchive
+
+*** Currently
+
+ - SWH does not store Disarchive manifests yet
+
+*** Plan
+
+ - Run Disarchive every time SWH loads a tarball
+ - Store it as `(tarball-hash, directory-hash, manifest)` tuples
+ - when someone requests `tarball-hash`, rebuild from the manifest
+
+* Current Work in Progress
+
+** NixGuix Coverage in SWH
+
+ - It's missing sources due to technical limitations (files, directories, ...)
+ - Refactoring in progress to deal with such limitations
+
+** Disarchive
+
+ - code dump at https://git.ngyro.com/swh/
+ - needs to be reviewed and merged
diff --git a/talks-public/2022-09-16-Guix/Makefile b/talks-public/2022-09-16-Guix/Makefile
new file mode 100644
index 0000000..68fbee7
--- /dev/null
+++ b/talks-public/2022-09-16-Guix/Makefile
@@ -0,0 +1 @@
+include ../Makefile.slides

File Metadata

Mime Type
text/x-diff
Expires
Sat, Jun 21, 9:23 PM (4 w, 1 d ago)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3447424

Event Timeline