diff --git a/talks-public/2018-10-06-lille-pycon/2018-10-06-lille-pycon.org b/talks-public/2018-10-06-lille-pycon/2018-10-06-lille-pycon.org
index bdaf900..365a053 100644
--- a/talks-public/2018-10-06-lille-pycon/2018-10-06-lille-pycon.org
+++ b/talks-public/2018-10-06-lille-pycon/2018-10-06-lille-pycon.org
@@ -1,363 +1,374 @@
#+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt)
#+TITLE: Software Heritage
#+SUBTITLE: The Great Library of (Python) Source Code
#+BEAMER_HEADER: \date[06/10/2018, PyConFr]{6 Oct 2018\\PyConFr - Lille, France}
#+DATE: 6 October 2018
#+INCLUDE: "../../common/modules/prelude-toc.org" :minlevel 1
#+INCLUDE: "../../common/modules/169.org"
#+BEAMER_HEADER: \institute[Software Heritage]{Software Heritage --- {\tt \{olasd,zack\}@softwareheritage.org}}
#+BEAMER_HEADER: \author{Nicolas Dandrimont, Stefano Zacchiroli}
#+LATEX_HEADER_EXTRA: \usepackage{bbding}
#+LATEX_HEADER_EXTRA: \DeclareUnicodeCharacter{66D}{\FiveStar}
#+LATEX_HEADER_EXTRA: \usepackage{tikz}
#+LATEX_HEADER_EXTRA: \usetikzlibrary{arrows,shapes}
#+LATEX_HEADER_EXTRA: \definecolor{swh-orange}{RGB}{254,205,27}
#+LATEX_HEADER_EXTRA: \definecolor{swh-red}{RGB}{226,0,38}
#+LATEX_HEADER_EXTRA: \definecolor{swh-green}{RGB}{77,181,174}
# Syntax highlighting setup
#+LATEX_HEADER_EXTRA: \usepackage{minted}
#+LaTeX_HEADER_EXTRA: \usemintedstyle{tango}
#+LaTeX_HEADER_EXTRA: \newminted{python}{fontsize=\scriptsize}
#+LaTeX_HEADER_EXTRA: \newminted{html}{fontsize=\scriptsize}
#+name: setup-minted
#+begin_src emacs-lisp :exports results :results silent
(setq org-latex-listings 'minted)
(setq org-latex-minted-options
'(("fontsize" "\\scriptsize")
("linenos" "")))
(setq org-latex-to-pdf-process
'("pdflatex -shell-escape -interaction nonstopmode -output-directory %o %f"
"pdflatex -shell-escape -interaction nonstopmode -output-directory %o %f"
"pdflatex -shell-escape -interaction nonstopmode -output-directory %o %f"))
#+end_src
# End syntax highlighting setup
* The Software Commons
** (Free) Software is everywhere
#+latex: \begin{center}
#+ATTR_LATEX: :width .75\linewidth
file:software-center.pdf
#+latex: \end{center}
#+INCLUDE: "../../common/modules/source-code-different-short.org::#softwareisdifferent" :minlevel 2
** Our Software Commons
#+INCLUDE: "../../common/modules/foss-commons.org::#commonsdef" :only-contents t
#+BEAMER: \pause
*** Source code is /a precious part/ of our commons
\hfill are we taking care of it?
#+INCLUDE: "../../common/modules/swh-motivations-foss.org::#main" :only-contents t :minlevel 2
* Software Heritage
#+INCLUDE: "../../common/modules/swh-overview-sourcecode.org::#mission" :minlevel 2
** Core principles
#+latex: \begin{center}
#+ATTR_LATEX: :width .9\linewidth
file:SWH-as-foundation-slim.png
#+latex: \end{center}
#+BEAMER: \pause
*** Open approach :B_block:BMCOL:
:PROPERTIES:
:BEAMER_col: 0.4
:BEAMER_env: block
:END:
- 100% Free Software
- transparency
*** In for the long haul :B_block:BMCOL:
:PROPERTIES:
:BEAMER_col: 0.4
:BEAMER_env: block
:END:
- replication
- non profit
#+INCLUDE: "../../common/modules/status-extended.org::#archivinggoals" :minlevel 2
#+INCLUDE: "../../common/modules/status-extended.org::#architecture" :minlevel 2 :only-contents t
#+INCLUDE: "../../common/modules/status-extended.org::#dagdetail" :minlevel 2 :only-contents t
#+INCLUDE: "../../common/modules/status-extended.org::#archive" :minlevel 2
# * Accessing the archive
# #+INCLUDE: "../../common/modules/status-extended.org::#api" :only-contents t
#+INCLUDE: "../../common/modules/status-extended.org::#apiintro" :minlevel 2
#+INCLUDE: "../../common/modules/vault.org::#overview" :minlevel 2
#+INCLUDE: "../../common/modules/webui.org::#intro" :minlevel 2
* The Great Library of Python source code
** Data flow redux
#+BEAMER: \begin{center}\includegraphics[width=\extblockscale{1.2\textwidth}]{swh-dataflow.pdf}\end{center}
** Our focus
#+BEAMER: \begin{center}\includegraphics[width=\extblockscale{1.2\textwidth}]{swh-dataflow-pypi.pdf}\end{center}
** Listing all Python modules (1/3)
***
#+BEAMER: \footnotesize \centering
https://forge.softwareheritage.org/source/swh-lister/
*** What does a Software Heritage lister do?
- crawls and parses upstream list of project APIs
- generates origins (records that the project has been detected) and loading tasks
+
+#+beamer: \pause
***
Credits go to Avi Kelman for the lister scaffolding, and to Antoine Dumont for the PyPI implementation
+#+beamer: \pause
+
+
*** A visit of the Cheese Shop
- A little bit more efficiently than [[https://www.youtube.com/watch?v=B3KBuQHHKx0][John Cleese]]
- Uses https://pypi.org/simple/ (according to the warehouse docs, the only "package listing" API that's not on the way to deprecation)
** Listing all Python modules (2/3)
*** GET https://pypi.org/simple/
#+begin_src html
Simple index
0
0-._.-._.-._.-._.-._.-._.-0
[...]
Django
[...]
#+end_src
** Listing all Python modules (3/3)
***
#+begin_src python
# Origin specification
origin = {
'type': 'pypi',
'url': 'https://pypi.org/packages/Django/', # Canonical project URL
}
#+end_src
#+beamer: \pause
#+begin_src python
# Scheduler task specification
update_task = {
'type': 'origin-update-pypi',
'policy': 'recurring',
'next_run': datetime.now(tz=timezone.utc),
'arguments': {
'args': [
'Django', # Project name
'https://pypi.org/packages/Django/', # Origin URL
'https://pypi.org/pypi/Django/json', # Metadata URL
],
'kwargs': {},
},
'priority': None,
}
#+end_src
** Task scheduling (1/2)
***
#+BEAMER: \footnotesize \centering
https://forge.softwareheritage.org/source/swh-scheduler/
*** What does the Software Heritage scheduler do?
- Record **recurrent** and **one-shot** jobs in a database
- Schedules runs of these jobs, records their results
- Manages retries for transient job failures (remote service unavailable, ...)
- Manages adaptive intervals for recurrent jobs
** Task scheduling (2/2)
*** Builds upon trusted Python tools
- Celery is used as a task queuing middleware, and for its worker management framework
- Workers send task results through the Celery events mechanism
+#+beamer: \pause
+
*** And makes them more useful to us
- The database is the single source of truth
- ~swh.scheduler.celery_backend.runner~ pulls tasks from the database into
Celery, limiting the RabbitMQ queue depth (allows task prioritization)
- ~swh.scheduler.celery_backend.listener~ fetches task results from Celery
events and updates the database
- Archival of elapsed tasks/runs/logs in elasticsearch to keep the database
snappy
** Loading Python packages (1/4)
*** What's a Python package anyway?
- Source distributions (~sdists~, currently tarballs or zips)
- Binary distributions (~bdists~, which are mostly wheels these days)
As we're interested in source code, Software Heritage looks at ~sdists~ exclusively
+#+beamer: \pause
+
- The current sdist format is unspecified: you probably get a tarball, which
maybe contains a ~setup.py~ somewhere
- When building a sdist, distutils generates a machine-readable ~PKG-INFO~ file
is generated and puts in the tarball
+#+beamer: \pause
+
*** The long wait for PEP 517 ("A build-system independent format for source trees")
- One uniform transport format: a gzipped tarball with one toplevel directory
- Machine parsable data about the project by default (~pyproject.toml~)
Hopefully soon in your nearest Cheese Shop (go help the folks in PyPA!)
** Loading Python packages (2/4)
***
#+BEAMER: \footnotesize \centering
https://forge.softwareheritage.org/source/swh-loader-pypi/
*** Common loading process
:PROPERTIES:
:BEAMER_col: 0.5
:BEAMER_env: block
:BEAMER_act: +-
:END:
Implemented in ~swh.loader.core~
- Fetch metadata about current versions
- Compare to latest loaded versions
- Download and process versions we had never seen
- Load new data
*** PyPI specifics
:PROPERTIES:
:BEAMER_col: 0.5
:BEAMER_env: block
:BEAMER_act: +-
:END:
Implemented in ~swh.loader.pypi~
- Comparison done using the ~sdist~ digests
- PKG-INFO metadata parsed and saved
- versions with multiple sdists imported separately
** Loading Python packages (3/4)
*** PyPI snapshots
#+begin_src python
pifpaf_snapshot = {
'id': b'\xc6_\xfe#\x94\xba\x81\xc3\x94\x9b\xeb[\x06\xf5JC\x0f\x19n\xa6',
'branches': {
b'releases/0.0.1': {
b'releases/0.0.2': {
...
b'releases/2.1.2': {
'target': b'\x8a\xcd\xf3l\xee\xe50\xe2\x81]\x08:5\xd9_\xd6\xeff\xc9\xa3',
'target_type': 'revision',
},
b'releases/2.1.2.dev7': {
'target': b'hGh\x15h|\xf3\xd2v\xf8\xec-\xa7\xfeuB\xda3\x83x',
'target_type': 'revision',
},
b'HEAD': {
'target': b'releases/2.1.2',
'target_type': 'alias',
},
},
}
#+end_src
** Loading Python packages (4/4)
*** PyPI revisions
#+begin_src python -i
pifpaf_revision = {
'id': b'\x8a\xcd\xf3l\xee\xe50\xe2\x81]\x08:5\xd9_\xd6\xeff\xc9\xa3',
'author': {
'name': b'Julien Danjou',
...
},
'date': {
'timestamp': {'seconds': 1538577319, 'microseconds': 0},
},
...
'type': 'tar',
'directory': b'\xa4\xf2\xad\xb1\xef\r\xcf\x894::@=\xf9R\x86=\x19"\\',
'message': b'2.1.2',
#+end_src
#+beamer: \pause
#+begin_src python -i
'metadata': {
'project': { # Metadata parsed from PKG-INFO
'name': 'pifpaf',
'author': 'Julien Danjou',
'license': None,
'summary': 'Suite of tools and fixtures to manage daemons for testing',
'version': '2.1.2',
...
#+end_src
**
***
#+begin_src python -i
'classifiers': [
'Intended Audience :: Information Technology',
...
],
...
},
#+end_src
#+beamer: \pause
#+begin_src python -i
'original_artifact': { # The original tarball we downloaded
'url': 'https://files.pythonhosted.org/packages/cc/ce/2599[...]',
'date': '2018-10-03T14:35:19',
'sha1': '00c4efc47580b5c4ad1dcdb5118159f9b057b0fd',
'size': 192940,
'sha256': 'a6eef2ae56ac90d02df5f45885973e108c960a2ea113cc76[...]',
'filename': 'pifpaf-2.1.2.tar.gz',
'sha1_git': '8ce7e3ddda336dd9edff26ae8efaf4b81439c42c',
'blake2s256': 'c4f7fcd4324715f4bfb54f8eefb10fde803efb7a02e2[...]',
'archive_type': 'tar',
},
},
'synthetic': True,
'parents': [],
}
#+end_src
* Getting involved
#+INCLUDE: "../../common/modules/status-extended.org::#features" :minlevel 2
** You can help!
#+BEAMER: \vspace{-2mm}
*** Coding
| ٭٭ | Web UI improvements |
| ٭٭٭ | loaders for unsupported VCS/package formats |
| ٭٭٭ | listers for unsupported forges/package managers |
#+BEAMER: \vspace{-2mm} \footnotesize \centering
\url{https://forge.softwareheritage.org/} \\
\url{https://docs.softwareheritage.org/devel/}
#+BEAMER: \pause
*** Community
| ٭٭٭ | spread the world, help us with sustainability |
| ٭٭ | document endangered source code |
#+BEAMER: \vspace{-2mm} \footnotesize \centering
\url{wiki.softwareheritage.org/Suggestion_box}
#+BEAMER: \pause
*** Join us
#+BEAMER: \footnotesize \centering
- \url{www.softwareheritage.org/jobs} --- *job openings*
- \url{wiki.softwareheritage.org/Internship} --- *internships*
** Conclusion
*** Software Heritage is
- a reference archive of *all Free Software* ever written
- an international, open, nonprofit, *mutualized infrastructure*
- *now accessible* to developers, users, vendors
- at the service of our community, *at the service of society*
*** Come in, we're open!
\url{www.softwareheritage.org} --- general information \\
\url{wiki.softwareheritage.org} --- internships, leads \\
\url{forge.softwareheritage.org} --- our own code