diff --git a/talks-public/2018-10-06-lille-pycon/2018-10-06-lille-pycon.org b/talks-public/2018-10-06-lille-pycon/2018-10-06-lille-pycon.org index 4e1547b..74ed842 100644 --- a/talks-public/2018-10-06-lille-pycon/2018-10-06-lille-pycon.org +++ b/talks-public/2018-10-06-lille-pycon/2018-10-06-lille-pycon.org @@ -1,214 +1,219 @@ #+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt) #+TITLE: Software Heritage #+SUBTITLE: The Great Library of (Python) Source Code #+BEAMER_HEADER: \date[06/10/2018, PyConFr]{6 Oct 2018\\PyConFr\\Lille, France} #+DATE: 6 October 2018 #+INCLUDE: "../../common/modules/prelude-toc.org" :minlevel 1 #+INCLUDE: "../../common/modules/169.org" #+BEAMER_HEADER: \institute[Software Heritage]{Software Heritage --- {\tt \{olasd,zack\}@softwareheritage.org}} #+BEAMER_HEADER: \author{Nicolas Dandrimont, Stefano Zacchiroli} #+LATEX_HEADER_EXTRA: \usepackage{bbding} #+LATEX_HEADER_EXTRA: \DeclareUnicodeCharacter{66D}{\FiveStar} #+LATEX_HEADER_EXTRA: \usepackage{tikz} #+LATEX_HEADER_EXTRA: \usetikzlibrary{arrows,shapes} #+LATEX_HEADER_EXTRA: \definecolor{swh-orange}{RGB}{254,205,27} #+LATEX_HEADER_EXTRA: \definecolor{swh-red}{RGB}{226,0,38} #+LATEX_HEADER_EXTRA: \definecolor{swh-green}{RGB}{77,181,174} # Syntax highlighting setup #+LATEX_HEADER_EXTRA: \usepackage{minted} #+LaTeX_HEADER_EXTRA: \usemintedstyle{tango} #+LaTeX_HEADER_EXTRA: \newminted{python}{fontsize=\scriptsize} #+LaTeX_HEADER_EXTRA: \newminted{html}{fontsize=\scriptsize} #+name: setup-minted #+begin_src emacs-lisp :exports results :results silent (setq org-latex-listings 'minted) (setq org-latex-minted-options '(("frame" "lines") ("fontsize" "\\scriptsize") ("linenos" ""))) (setq org-latex-to-pdf-process '("pdflatex -shell-escape -interaction nonstopmode -output-directory %o %f" "pdflatex -shell-escape -interaction nonstopmode -output-directory %o %f" "pdflatex -shell-escape -interaction nonstopmode -output-directory %o %f")) #+end_src # End syntax highlighting setup * The Software Commons ** (Free) Software is everywhere #+latex: \begin{center} #+ATTR_LATEX: :width .75\linewidth file:software-center.pdf #+latex: \end{center} #+INCLUDE: "../../common/modules/source-code-different-short.org::#softwareisdifferent" :minlevel 2 ** Our Software Commons #+INCLUDE: "../../common/modules/foss-commons.org::#commonsdef" :only-contents t #+BEAMER: \pause *** Source code is /a precious part/ of our commons \hfill are we taking care of it? #+INCLUDE: "../../common/modules/swh-motivations-foss.org::#main" :only-contents t :minlevel 2 * Software Heritage #+INCLUDE: "../../common/modules/swh-overview-sourcecode.org::#mission" :minlevel 2 ** Core principles #+latex: \begin{center} #+ATTR_LATEX: :width .9\linewidth file:SWH-as-foundation-slim.png #+latex: \end{center} #+BEAMER: \pause *** Open approach :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.4 :BEAMER_env: block :END: - 100% Free Software - transparency *** In for the long haul :B_block:BMCOL: :PROPERTIES: :BEAMER_col: 0.4 :BEAMER_env: block :END: - replication - non profit #+INCLUDE: "../../common/modules/status-extended.org::#archivinggoals" :minlevel 2 #+INCLUDE: "../../common/modules/status-extended.org::#architecture" :minlevel 2 :only-contents t #+INCLUDE: "../../common/modules/status-extended.org::#dagdetail" :minlevel 2 :only-contents t #+INCLUDE: "../../common/modules/status-extended.org::#archive" :minlevel 2 # * Accessing the archive # #+INCLUDE: "../../common/modules/status-extended.org::#api" :only-contents t #+INCLUDE: "../../common/modules/status-extended.org::#apiintro" :minlevel 2 #+INCLUDE: "../../common/modules/vault.org::#overview" :minlevel 2 #+INCLUDE: "../../common/modules/webui.org::#intro" :minlevel 2 * The Great Library of Python source code +** Data flow redux + #+BEAMER: \begin{center}\includegraphics[width=\extblockscale{1.2\textwidth}]{swh-dataflow.pdf}\end{center} +** Our focus + #+BEAMER: \begin{center}\includegraphics[width=\extblockscale{1.2\textwidth}]{swh-dataflow-pypi.pdf}\end{center} + ** Listing all Python modules (1/3) *** https://forge.softwareheritage.org/source/swh-lister/ *** What does a Software Heritage lister do? - crawls and parses upstream list of project APIs - generates origins (records that the project has been detected) and loading tasks *** Credits go to Avi Kelman for the lister scaffolding, and to Antoine Dumont for the PyPI implementation *** A visit of the Cheese Shop - A little bit more efficiently than [[https://www.youtube.com/watch?v=B3KBuQHHKx0][John Cleese]] - Uses https://pypi.org/simple/ (according to the warehouse docs, the only "package listing" API that's not on the way to deprecation) ** Listing all Python modules (2/3) *** GET https://pypi.org/simple/ #+begin_src html Simple index 0 0-._.-._.-._.-._.-._.-._.-0 [...] Django [...] #+end_src ** Listing all Python modules (3/3) *** #+begin_src python # Origin specification origin = { 'type': 'pypi', 'url': 'https://pypi.org/packages/Django/', # Canonical project URL } # Scheduler task specification update_task = { 'type': 'origin-update-pypi', 'policy': 'recurring', 'next_run': datetime.now(tz=timezone.utc), 'arguments': { 'args': [ 'Django', # Project name 'https://pypi.org/packages/Django/', # Origin URL 'https://pypi.org/pypi/Django/json', # Metadata URL ], 'kwargs': {}, }, 'priority': None, } #+end_src ** Task scheduling (1/2) *** https://forge.softwareheritage.org/source/swh-scheduler/ *** What does the Software Heritage scheduler do? - Record **recurrent** and **one-shot** jobs in a database - Schedules runs of these jobs, records their results - Manages retries for transient job failures (remote service unavailable, ...) - Manages adaptive intervals for recurrent jobs ** Task scheduling (2/2) *** Builds upon trusted Python tools - Celery is used as a task queuing middleware, and for its worker management framework - Workers send task results through the Celery events mechanism *** And makes it more reliable - The database is the single source of truth - ~swh.scheduler.celery_backend.runner~ pulls tasks from the database into Celery, limiting the RabbitMQ queue depth (allows task prioritization) - ~swh.scheduler.celery_backend.listener~ fetches task results from Celery events and updates the database ** Loading Python packages * Getting involved #+INCLUDE: "../../common/modules/status-extended.org::#features" :minlevel 2 ** You can help! #+BEAMER: \vspace{-2mm} *** Coding | ٭٭ | Web UI improvements | | ٭٭٭ | loaders for unsupported VCS/package formats | | ٭٭٭ | listers for unsupported forges/package managers | #+BEAMER: \vspace{-2mm} \footnotesize \centering \url{https://forge.softwareheritage.org/} \\ \url{https://docs.softwareheritage.org/devel/} #+BEAMER: \pause *** Community | ٭٭٭ | spread the world, help us with sustainability | | ٭٭ | document endangered source code | #+BEAMER: \vspace{-2mm} \footnotesize \centering \url{wiki.softwareheritage.org/Suggestion_box} #+BEAMER: \pause *** Join us #+BEAMER: \footnotesize \centering - \url{www.softwareheritage.org/jobs} --- *job openings* - \url{wiki.softwareheritage.org/Internship} --- *internships* ** Conclusion *** Software Heritage is - a reference archive of *all Free Software* ever written - an international, open, nonprofit, *mutualized infrastructure* - *now accessible* to developers, users, vendors - at the service of our community, *at the service of society* *** Come in, we're open! \url{www.softwareheritage.org} --- general information \\ \url{wiki.softwareheritage.org} --- internships, leads \\ \url{forge.softwareheritage.org} --- our own code