Page Menu
Home
Software Heritage
Search
Configure Global Search
Log In
Files
F9125829
No One
Temporary
Actions
View File
Edit File
Delete File
View Transforms
Subscribe
Mute Notifications
Award Token
Flag For Later
Size
10 KB
Subscribers
None
View Options
diff --git a/talks-public/2022-09-16-Guix/2022-06-24-Guix.org b/talks-public/2022-09-16-Guix/2022-06-24-Guix.org
new file mode 100644
index 0000000..06bebb7
--- /dev/null
+++ b/talks-public/2022-09-16-Guix/2022-06-24-Guix.org
@@ -0,0 +1,306 @@
+#+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt)
+#+KEYWORDS: software heritage reproducibility guix
+#+TITLE: 10 years of Guix - Software Heritage
+#+SUBTITLE: SWH to the rescue of reproducible Science
+#+AUTHOR: vlorentz, ardumont
+#+EMAIL: vlorentz@softwareheritage.org, ardumont@softwareheritage.org
+#+DATE: 16 Sep 2022
+#+BEAMER_HEADER: \date[16/09/2022]{16/09/2022\\Event 10 years of Guix, Paris 2022}
+# #+BEAMER_HEADER: \title[Archive and reference software~~~~ www.softwareheritage.org]{SWH to the rescue of reproducible Science}
+#+BEAMER_HEADER: \author{Valentin Lorentz (@vlorentz) / Antoine R. Dumont (@ardumont)}
+#+BEAMER_HEADER: \institute[Software Heritage]{Software Engineers, Software Heritage\\Inria}
+
+# #+BEAMER_HEADER: \setbeameroption{show notes on second screen}
+#+BEAMER_HEADER: \setbeameroption{hide notes}
+#+LATEX_HEADER: \usepackage{tcolorbox}
+#+LATEX_HEADER: \definecolor{links}{HTML}{2A1B81}
+#+LATEX_HEADER: \hypersetup{colorlinks,linkcolor=,urlcolor=links}
+
+#
+# prelude.org contains all the information needed to export the main beamer latex source
+# use prelude-toc.org to get the table of contents
+#
+
+#+INCLUDE: "../../common/modules/prelude-toc.org" :minlevel 1
+
+#+INCLUDE: "../../common/modules/169.org"
+
+# +LaTeX_CLASS_OPTIONS: [aspectratio=169,handout,xcolor=table]
+
+#+LATEX_HEADER: \usepackage{bbding}
+#+LATEX_HEADER: \DeclareUnicodeCharacter{66D}{\FiveStar}
+
+#
+# If you want to change the title logo it's here
+#
+# +BEAMER_HEADER: \titlegraphic{\includegraphics[width=0.5\textwidth]{SWH-logo}}
+
+# aspect ratio can be changed, but the slides need to be adapted
+# - compute a "resizing factor" for the images (macro for picblocks?)
+#
+# set the background image
+#
+# https://pacoup.com/2011/06/12/list-of-true-169-resolutions/
+#
+#+BEAMER_HEADER: \pgfdeclareimage[height=90mm,width=160mm]{bgd}{swh-world-169.png}
+#+BEAMER_HEADER: \setbeamertemplate{background}{\pgfuseimage{bgd}}
+#+LATEX: \addtocounter{framenumber}{-1}
+
+
+* Introduction: the Software Heritage project
+
+** What is SoftwareHeritage?
+ :PROPERTIES:
+ :CUSTOM_ID: spread
+ :END:
+
+ The universal source code Archive
+
+** Why an archive? Software is spread all around
+ :PROPERTIES:
+ :CUSTOM_ID: spread
+ :END:
+ #+latex: \begin{flushleft}
+ #+ATTR_LATEX: :width \extblockscale{.5\linewidth}
+ file:myriadsources.png
+ #+latex: \end{flushleft}
+
+*** Fashion victims
+ - disparate development platforms (popular forges: Guix, PyPI, npm, ...)
+ - various places where distribution happens (standalone forges: gitlab, heptapod, cgit, gitea...)
+ - projects tend to migrate from one place to another over time
+
+*** One place... :B_block:
+ :PROPERTIES:
+ :BEAMER_env: block
+ :END:
+ \hfill ... where can we find, track and search /all/ source code, rebuild tarballs?
+
+** Why an archive? Software is fragile
+ :PROPERTIES:
+ :CUSTOM_ID: fragile
+ :END:
+ #+latex: \begin{flushleft}
+ #+ATTR_LATEX: :width \extblockscale{.5\linewidth}
+ file:fragilecloud.png
+ #+latex: \end{flushleft}
+
+*** Like all digital information, FOSS is fragile
+# - inconsiderate and/or malicious code loss (e.g., Code Spaces)
+ - link rot: projects are created, moved around, removed
+ - business-driven code loss (e.g., Gitorious, Google Code, Bitbucket, ...)
+ - data rot: physical media with legacy software decay
+
+*** If a website disappears you go to the Internet Archive... :B_block:
+ :PROPERTIES:
+ :BEAMER_env: block
+ :END:
+ \hfill where do you go if (a repository on) GitHub or GitLab goes away?
+
+** Software Heritage in a Nutshell
+
+#+latex: \begin{center}
+#+ATTR_LATEX: :width \extblockscale{.6\linewidth}
+file:SWH-logo+motto.pdf
+#+latex: \end{center}
+
+
+*** Main Objectives
+- *Collect*, *Preserve* and *Share*
+
+*** Reference catalog
+ :PROPERTIES:
+ :BEAMER_env: block
+ :BEAMER_COL: .3
+ :END:
+#+BEGIN_EXPORT latex
+\begin{center}
+\includegraphics[width=.4\linewidth]{myriadsources}
+\end{center}
+#+END_EXPORT
+ *find* and *reference* all software source code
+
+*** Universal archive
+ :PROPERTIES:
+ :BEAMER_env: block
+ :BEAMER_COL: .3
+ :END:
+#+BEGIN_EXPORT latex
+\begin{center}
+\includegraphics[width=.4\linewidth]{fragilecloud}
+\end{center}
+#+END_EXPORT
+ *preserve* all the archived software source code *forever*
+
+*** Research infrastructure :B_block:
+ :PROPERTIES:
+ :BEAMER_COL: .3
+ :BEAMER_env: block
+ :END:
+#+BEGIN_EXPORT latex
+\begin{center}
+\includegraphics[width=.4\linewidth]{atacama-telescope}
+\end{center}
+#+END_EXPORT
+ *enable analysis* of all software source code, make every piece *identifiable* and freely *available*
+
+** Our principles
+ :PROPERTIES:
+ :CUSTOM_ID: principlesstatus
+ :END:
+#+latex: \begin{center}
+#+ATTR_LATEX: :width .6\linewidth
+file:SWH-as-foundation-slim.png
+#+latex: \end{center}
+#+latex: \footnotesize\vspace{-3mm}
+
+ #+latex: \centering
+ #+ATTR_LATEX: :width \extblockscale{.8\linewidth}
+ file:2022-05-06-archive-growth.png
+
+** Under the hood: Automation, and storage
+ :PROPERTIES:
+ :CUSTOM_ID: automation
+ :END:
+ #+BEAMER: \begin{center}
+ #+BEAMER: \only<1>{\includegraphics[width=\extblockscale{\textwidth}]{swh-dataflow-merkle.pdf}}
+ #+BEAMER: \end{center}
+ /Global development history/ *permanently archived* in a *uniform data model*
+ - over *12 billion* unique source files from almost *180 million* software projects
+ - *~900 TB* (uncompressed) blobs, *~25 B* nodes, *~300 B* edges
+
+* Reference archived code with SWHIDs
+** R(eference): granularity and identifiers \hfill [[http://doi.org/10.15497/RDA00053][10.15497/RDA00053]]
+ #+LATEX: \centering\forcebeamerstart
+ #+LATEX: \only<1>{\includegraphics[width=0.8\linewidth]{Granularity-Level-animated-0.png}}
+ #+LATEX: \only<2>{\includegraphics[width=0.8\linewidth]{Granularity-Level-animated-1.png}}
+ #+LATEX: \only<3>{\includegraphics[width=0.8\linewidth]{Granularity-Level-animated-2.png}}
+ #+LATEX: \only<4>{\includegraphics[width=0.8\linewidth]{Granularity-Level-animated-3.png}}
+ #+LATEX: \forcebeamerend
+ #+LATEX: \only<1>{\begin{block}{}\centering Top concept layers vs. bottom artifact layers\end{block}}
+ #+LATEX: \only<2>{\begin{block}{}\centering Extrinsic identifiers are key for the concept layers\end{block}}
+ #+LATEX: \only<3>{\begin{block}{}\centering Intrinsic identifiers are key for the artifact layers\end{block}}
+ #+LATEX: \only<4>{\begin{block}{}\centering In some cases, extrinsic identifiers can be added too\end{block}}
+
+** Meet the SWHID intrinsic identifiers
+
+ :PROPERTIES:
+ :CUSTOM_ID: oneslide
+ :END:
+ #+LATEX: \centering
+ #+LATEX: \only<1>{\includegraphics[width=\linewidth]{SWHID-v1.4_3.png}}
+ #+LATEX: \forcebeamerend \vspace{-6mm}
+
+** Meet the SWHID intrinsic identifiers
+
+ \centering [[https://archive.softwareheritage.org/browse/origin/directory/?origin_url=https://src.koda.cnrs.fr/mmdc/sensorsio][SWHID DEMO !]]
+
+ \vspace{1em}
+
+ \centering [[https://www.softwareheritage.org/2020/07/09/intrinsic-vs-extrinsic-identifiers/][Reference : Extrinsic vs intrinsic identifiers]]
+
+* Guix
+** How this relates to Guix?
+- Nothing is eternal, source code (in all forms) disappears
+- Hopefully, SWH keeps a copy of everything
+- Guix ensures source code is archived in SWH when building
+- After source code actually disappears, falls back to SWH when rebuilding
+** Reproducibility is of the essence!
+
+*** Report
+
+- Tarballs will disappear (give it enough time)
+- Persistent (intrinsic) identifier (SWHID) is not (yet?) package manager standard
+- Guix (and other) package managers reference tarball hashes
+
+*** Conclusion
+
+- make (non-specific swh) SWHID standard or rebuild original bit-by-bit tarball
+
+#+begin_src scheme
+(define-public ...
+ (package
+ ...
+ (source (origin (method url-fetch)
+ (uri (string-append "https://..." version ".tar.gz"))
+ (sha256 (base32 "03mwi1l3354x52nar..."))))
+ ...
+#+end_src
+
+* Enters... Disarchive
+
+** How it started
+
+*** Discussions
+ - https://issues.guix.gnu.org/42162
+ - https://forge.softwareheritage.org/T2430
+
+*** New software
+ - Disarchive https://git.ngyro.com/disarchive/
+
+** How it works: wip
+
+ - Manifest of tarball fields (entry order, PAX headers, ...)
+ - References to individual file hashes
+ - WIP: guessing compression parameters/implementations (using zgz)
+ - -> rebuild original `.tar`, then original `.tar.{gz,xz}`
+
+** Example manifest (1/2)
+
+#+begin_src scheme
+(disarchive
+ (version 0)
+ (tarball
+ (name "test-archive.tar")
+ (digest (sha256 "0da9fa3e7b360533678338871d9dd36f3b7ded..."))
+ (default-header
+ (chksum (trailer " "))
+ (magic "ustar ")
+ (version " \x00")
+ (devmajor 0 (source "" (trailer "")))
+ (devminor 0 (source "" (trailer "")))
+ (data-padding ""))
+...
+#+end_src
+
+** Example manifest (2/2)
+
+#+begin_src scheme
+(disarchive
+...
+ (headers
+ ("test-archive/" (mode 493) (chksum 4291) (typeflag 53))
+ ("test-archive/file-a" (size 15) (chksum 4849))
+ ("test-archive/file-b" (size 15) (chksum 4850)))
+ (padding 6656)
+ (input (directory-ref
+ (version 0)
+ (name "test-archive")
+ (addresses
+ (swhid "swh:1:dir:902b1e94f0f5efdde63273c..."))
+ (digest (sha256 "277decb2666f4832ef64ae4655...")))))
+#+end_src
+
+** Planned integration of SWH with Disarchive
+
+*** Currently
+
+ - SWH does not store Disarchive manifests yet
+
+*** Plan
+
+ - Run Disarchive every time SWH loads a tarball
+ - Store it as `(tarball-hash, directory-hash, manifest)` tuples
+ - when someone requests `tarball-hash`, rebuild from the manifest
+
+* Current Work in Progress
+
+** NixGuix Coverage in SWH
+
+ - It's missing sources due to technical limitations (files, directories, ...)
+ - Refactoring in progress to deal with such limitations
+
+** Disarchive
+
+ - code dump at https://git.ngyro.com/swh/
+ - needs to be reviewed and merged
diff --git a/talks-public/2022-09-16-Guix/Makefile b/talks-public/2022-09-16-Guix/Makefile
new file mode 100644
index 0000000..68fbee7
--- /dev/null
+++ b/talks-public/2022-09-16-Guix/Makefile
@@ -0,0 +1 @@
+include ../Makefile.slides
File Metadata
Details
Attached
Mime Type
text/x-diff
Expires
Sat, Jun 21, 9:23 PM (4 w, 1 d ago)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3447424
Attached To
rMSLD Slides and presentation material
Event Timeline
Log In to Comment