diff --git a/common/images/pypi-blog-post.png b/common/images/pypi-blog-post.png new file mode 100644 index 0000000..613bdd2 Binary files /dev/null and b/common/images/pypi-blog-post.png differ diff --git a/common/modules/pypi-loader.org b/common/modules/pypi-loader.org index 94ac3e0..91c3078 100644 --- a/common/modules/pypi-loader.org +++ b/common/modules/pypi-loader.org @@ -1,261 +1,259 @@ #+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt) # Tech details about archive Python packages from PyPI #+INCLUDE: "prelude.org" :minlevel 1 * PyPI loader :PROPERTIES: :CUSTOM_ID: main :END: -** Data flow redux - - #+BEAMER: \begin{center}\includegraphics[width=\extblockscale{1.2\textwidth}]{swh-dataflow.pdf}\end{center} +# ** Data flow redux +# #+BEAMER: \begin{center}\includegraphics[width=\extblockscale{1.2\textwidth}]{swh-dataflow.pdf}\end{center} ** Our focus #+BEAMER: \begin{center}\includegraphics[width=\extblockscale{1.2\textwidth}]{swh-dataflow-pypi.pdf}\end{center} - ** Listing all Python modules (1/3) *** #+BEAMER: \footnotesize \centering https://forge.softwareheritage.org/source/swh-lister/ *** What does a Software Heritage lister do? - crawls and parses upstream list of project APIs - generates origins (records that the project has been detected) and loading tasks #+beamer: \pause *** Credits go to Avi Kelman for the lister scaffolding, and to Antoine Dumont for the PyPI implementation #+beamer: \pause *** A visit of the Cheese Shop - A little bit more efficiently than [[https://www.youtube.com/watch?v=B3KBuQHHKx0][John Cleese]] - Uses https://pypi.org/simple/ (according to the warehouse docs, the only "package listing" API that's not on the way to deprecation) ** Listing all Python modules (2/3) *** GET https://pypi.org/simple/ #+begin_src html Simple index 0 0-._.-._.-._.-._.-._.-._.-0 [...] Django [...] #+end_src ** Listing all Python modules (3/3) *** #+begin_src python # Origin specification origin = { 'type': 'pypi', 'url': 'https://pypi.org/packages/Django/', # Canonical project URL } #+end_src #+beamer: \pause #+begin_src python # Scheduler task specification update_task = { 'type': 'origin-update-pypi', 'policy': 'recurring', 'next_run': datetime.now(tz=timezone.utc), 'arguments': { 'args': [ 'Django', # Project name 'https://pypi.org/packages/Django/', # Origin URL 'https://pypi.org/pypi/Django/json', # Metadata URL ], 'kwargs': {}, }, 'priority': None, } #+end_src ** Task scheduling (1/2) *** #+BEAMER: \footnotesize \centering https://forge.softwareheritage.org/source/swh-scheduler/ *** What does the Software Heritage scheduler do? - Record **recurrent** and **one-shot** jobs in a database - Schedules runs of these jobs, records their results - Manages retries for transient job failures (remote service unavailable, ...) - Manages adaptive intervals for recurrent jobs ** Task scheduling (2/2) *** Builds upon trusted Python tools - Celery is used as a task queuing middleware, and for its worker management framework - Workers send task results through the Celery events mechanism #+beamer: \pause *** And makes them more useful to us - The database is the single source of truth - ~swh.scheduler.celery_backend.runner~ pulls tasks from the database into Celery, limiting the RabbitMQ queue depth (allows task prioritization) - ~swh.scheduler.celery_backend.listener~ fetches task results from Celery events and updates the database - Archival of elapsed tasks/runs/logs in elasticsearch to keep the database snappy ** Loading Python packages (1/4) *** What's a Python package anyway? - Source distributions (~sdists~, currently tarballs or zips) - Binary distributions (~bdists~, which are mostly wheels these days) As we're interested in source code, Software Heritage looks at ~sdists~ exclusively #+beamer: \pause - The current sdist format is unspecified: you probably get a tarball, which maybe contains a ~setup.py~ somewhere - When building a sdist, distutils generates a machine-readable ~PKG-INFO~ file is generated and puts in the tarball #+beamer: \pause *** The long wait for PEP 517 ("A build-system independent format for source trees") - One uniform transport format: a gzipped tarball with one toplevel directory - Machine parsable data about the project by default (~pyproject.toml~) Hopefully soon in your nearest Cheese Shop (go help the folks in PyPA!) ** Loading Python packages (2/4) *** #+BEAMER: \footnotesize \centering https://forge.softwareheritage.org/source/swh-loader-pypi/ *** Common loading process :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_env: block :BEAMER_act: +- :END: Implemented in ~swh.loader.core~ - Fetch metadata about current versions - Compare to latest loaded versions - Download and process versions we had never seen - Load new data *** PyPI specifics :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_env: block :BEAMER_act: +- :END: Implemented in ~swh.loader.pypi~ - Comparison done using the ~sdist~ digests - PKG-INFO metadata parsed and saved - versions with multiple sdists imported separately ** Loading Python packages (3/4) *** PyPI snapshots #+begin_src python pifpaf_snapshot = { 'id': b'\xc6_\xfe#\x94\xba\x81\xc3\x94\x9b\xeb[\x06\xf5JC\x0f\x19n\xa6', 'branches': { b'releases/0.0.1': { b'releases/0.0.2': { ... b'releases/2.1.2': { 'target': b'\x8a\xcd\xf3l\xee\xe50\xe2\x81]\x08:5\xd9_\xd6\xeff\xc9\xa3', 'target_type': 'revision', }, b'releases/2.1.2.dev7': { 'target': b'hGh\x15h|\xf3\xd2v\xf8\xec-\xa7\xfeuB\xda3\x83x', 'target_type': 'revision', }, b'HEAD': { 'target': b'releases/2.1.2', 'target_type': 'alias', }, }, } #+end_src ** Loading Python packages (4/4) *** PyPI revisions #+begin_src python -i pifpaf_revision = { 'id': b'\x8a\xcd\xf3l\xee\xe50\xe2\x81]\x08:5\xd9_\xd6\xeff\xc9\xa3', 'author': { 'name': b'Julien Danjou', ... }, 'date': { 'timestamp': {'seconds': 1538577319, 'microseconds': 0}, }, ... 'type': 'tar', 'directory': b'\xa4\xf2\xad\xb1\xef\r\xcf\x894::@=\xf9R\x86=\x19"\\', 'message': b'2.1.2', #+end_src #+beamer: \pause #+begin_src python -i 'metadata': { 'project': { # Metadata parsed from PKG-INFO 'name': 'pifpaf', 'author': 'Julien Danjou', 'license': None, 'summary': 'Suite of tools and fixtures to manage daemons for testing', 'version': '2.1.2', ... #+end_src ** *** #+begin_src python -i 'classifiers': [ 'Intended Audience :: Information Technology', ... ], ... }, #+end_src #+beamer: \pause #+begin_src python -i 'original_artifact': { # The original tarball we downloaded 'url': 'https://files.pythonhosted.org/packages/cc/ce/2599[...]', 'date': '2018-10-03T14:35:19', 'sha1': '00c4efc47580b5c4ad1dcdb5118159f9b057b0fd', 'size': 192940, 'sha256': 'a6eef2ae56ac90d02df5f45885973e108c960a2ea113cc76[...]', 'filename': 'pifpaf-2.1.2.tar.gz', 'sha1_git': '8ce7e3ddda336dd9edff26ae8efaf4b81439c42c', 'blake2s256': 'c4f7fcd4324715f4bfb54f8eefb10fde803efb7a02e2[...]', 'archive_type': 'tar', }, }, 'synthetic': True, 'parents': [], } #+end_src diff --git a/common/modules/status-extended.org b/common/modules/status-extended.org index c33e497..ff50bf4 100644 --- a/common/modules/status-extended.org +++ b/common/modules/status-extended.org @@ -1,433 +1,433 @@ #+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt) #+INCLUDE: "prelude.org" :minlevel 1 # not to be included as a whole, just pick individual slides as you see fit * Status :PROPERTIES: :CUSTOM_ID: main :END: ** The people :PROPERTIES: :CUSTOM_ID: people :END: *** The core team :B_picblock: :PROPERTIES: :CUSTOM_ID: core-team-formal :BEAMER_env: picblock :BEAMER_opt: pic=team,width=.4\linewidth :END: - Roberto Di Cosmo - Stefano Zacchiroli - Nicolas Dandrimont (Engineer) - Antoine Dumont (Engineer) # - and /Jordi, Quentin and Guillaume/ *** Scientific advisors - Serge Abiteboul (French Science Academy) - Jean-François Abramatic (former W3C director) - Gerard Berry (CNRS Gold Medal, French Science Academy) - Julia Lawall (Coccinelle, Linux Kernel, Outreachy) ** Archive coverage :PROPERTIES: :CUSTOM_ID: archive :END: #+BEAMER: \vspace{-2mm} #+BEAMER: \begin{center}\includegraphics[width=\extblockscale{1.1\linewidth}]{2018-10-archive-growth.png}\end{center} #+BEAMER: \vspace{-2mm} *** Current sources - - live: GitHub, Debian, GitLab.com, **PyPI** + - live: GitHub, Debian, GitLab.com, PyPI - one-off: Gitorious, Google Code, GNU - WIP: Bitbucket #+BEAMER: \pause *** 175 TB (compressed) blobs, 6 TB database (as a graph: 10 B nodes + 100 B edges) #+BEAMER: \pause *** \hfill The /richest/ public source code archive, ... and growing daily! ** The structure of the archive :noexport: *** On-disk storage - flat file storage for contents - postgres database for the metadata *** Data model: /one/ big Merkle DAG, inspired by the git model - Origins (= repositories) - Occurrences (= branches) - Releases (= tags) - Revisions (= commits) - Directories (= trees) - Contents (= blobs) ** Archiving goals :PROPERTIES: :CUSTOM_ID: archivinggoals :END: Targets: VCS repositories & source code releases (e.g., tarballs) *** We DO archive - file *content* (= blobs) - *revisions* (= commits), with full metadata - *releases* (= tags), ditto - where (*origin*) & when (*visit*) we found any of the above # - time-indexed repo *snapshots* (i.e., we never delete anything) … in a VCS-/archive-agnostic *canonical data model* *** We DON'T archive # - diffs → derived data from related contents - homepages, wikis - BTS/issues/code reviews/etc. - mailing lists Long term vision: play our part in a /"semantic wikipedia of software"/ ** Architecture :PROPERTIES: :CUSTOM_ID: architecture :END: *** Data flow :PROPERTIES: :CUSTOM_ID: dataflow :END: # #+BEAMER: \begin{center}\includegraphics[width=\extblockscale{1.2\textwidth}]{swh-dataflow.pdf}\end{center} ** Data model :noexport: *** General schema - VCS-independent - fully deduplicated + files, directories and commits are /shared/ - biggest git-like /graph/ in the world *** \begin{center} \url{http://deb.li/swhdm} \end{center} *** full hash index (sha1, sha256, ...) Some funny facts: - the GPL2 licence appears under more than 500 names + including /aa.css.txt/ and /FullSync.txt/ ~ :-) ** Merkle DAG *** Merkle structure :PROPERTIES: :CUSTOM_ID: merkle :END: **** Merkle trees :PROPERTIES: :CUSTOM_ID: merkletree :END: # R. C. Merkle, A digital signature based on a conventional encryption # function, Crypto '87 #+BEAMER: \vspace{-3mm} ***** Merkle tree (R. C. Merkle, Crypto 1979) :B_picblock: :PROPERTIES: :BEAMER_opt: pic=merkle, leftpic=true, width=.7\linewidth :BEAMER_env: picblock :BEAMER_act: :END: Combination of - tree - hash function #+BEAMER: \pause #+BEAMER: \footnotesize ***** Classical cryptographic construction - fast, parallel signature of large data structures - widely used (e.g., Git, blockchains, IPFS, ...) - built-in deduplication **** The archive in a few pictures :PROPERTIES: :CUSTOM_ID: merkledemo :END: ***** A giant (extended) Merkle DAG #+LATEX: \only<1>{\colorbox{white}{\includegraphics[width=\extblockscale{.9\linewidth}]{git-merkle/merkle_1.pdf}}} #+LATEX: \only<2>{\colorbox{white}{\includegraphics[width=\extblockscale{.9\linewidth}]{git-merkle/contents.pdf}}} #+LATEX: \only<3>{\colorbox{white}{\includegraphics[width=\extblockscale{.9\linewidth}]{git-merkle/merkle_2_contents.pdf}}} #+LATEX: \only<4>{\colorbox{white}{\includegraphics[width=\extblockscale{.9\linewidth}]{git-merkle/directories.pdf}}} #+LATEX: \only<5>{\colorbox{white}{\includegraphics[width=\extblockscale{.9\linewidth}]{git-merkle/merkle_3_directories.pdf}}} #+LATEX: \only<6>{\colorbox{white}{\includegraphics[width=\extblockscale{.9\linewidth}]{git-merkle/revisions.pdf}}} #+LATEX: \only<7>{\colorbox{white}{\includegraphics[width=\extblockscale{.9\linewidth}]{git-merkle/merkle_4_revisions.pdf}}} #+LATEX: \only<8>{\colorbox{white}{\includegraphics[width=\extblockscale{.9\linewidth}]{git-merkle/releases.pdf}}} #+LATEX: \only<9>{\colorbox{white}{\includegraphics[width=\extblockscale{.9\linewidth}]{git-merkle/merkle_5_releases.pdf}}} # #+LATEX: {\colorbox{white}{\includegraphics[width=\extblockscale{.9\linewidth}]{git-merkle/merkle_1.pdf}}} *** A revision node :PROPERTIES: :CUSTOM_ID: merklerevision :END: **** Example: a Software Heritage revision ***** #+BEAMER: \vspace{-.5cm}\centering\includegraphics[width=0.9\textwidth]{git-merkle/revisions} ***** Note: most object kinds currently have Git-compatible identifiers *** Giant DAG :PROPERTIES: :CUSTOM_ID: giantdag :END: **** The archive: a (giant) Merkle DAG # Using an empty frame because the image is difficult to read on swh bg. # Finding a way to override image bg for just this frame would be better. ***** #+BEAMER: \centering \includegraphics[width=\extblockscale{\textwidth}]{git-merkle/merkle_5_releases} *** Giant DAG (single slide) :PROPERTIES: :CUSTOM_ID: giantdag1slide :END: **** The Software Heritage archive: a gigantic Merkle DAG #+LATEX: \centering\forcebeamerstart{} #+LATEX: \only<1>{\colorbox{white}{\includegraphics[width=.75\linewidth]{git-merkle/merkle_1}}} #+LATEX: \only<2>{\colorbox{white}{\includegraphics[width=.75\linewidth]{git-merkle/contents}}} #+LATEX: \only<3>{\colorbox{white}{\includegraphics[width=.75\linewidth]{git-merkle/merkle_2_contents}}} #+LATEX: \only<4>{\colorbox{white}{\includegraphics[width=.75\linewidth]{git-merkle/directories}}} #+LATEX: \only<5>{\colorbox{white}{\includegraphics[width=.75\linewidth]{git-merkle/merkle_3_directories}}} #+LATEX: \only<6>{\colorbox{white}{\includegraphics[width=.75\linewidth]{git-merkle/revisions}}} #+LATEX: \only<7>{\colorbox{white}{\includegraphics[width=.75\linewidth]{git-merkle/merkle_4_revisions}}} #+LATEX: \only<8>{\colorbox{white}{\includegraphics[width=.75\linewidth]{git-merkle/releases}}} #+LATEX: \only<9>{\colorbox{white}{\includegraphics[width=.75\linewidth]{git-merkle/merkle_5_releases}}} #+LATEX: \forcebeamerend{} *** Giant DAG (detailed) :PROPERTIES: :CUSTOM_ID: dagdetail :END: **** The archive: a (giant) Merkle DAG #+BEAMER: \vspace{-3mm} #+BEAMER: \centering \includegraphics[width=\textwidth]{swh-merkle-dag-wide} ** Technology :noexport: :PROPERTIES: :CUSTOM_ID: technology :END: *** Software stack **** 3rd party - Debian, Puppet, Ceph - PostgreSQL for metadata storage, with barman & pglogical - Celery (RabbitMQ backend) for task scheduling - Python3 and psycopg2 for the backend - Django, Bootstrap, D3.js for Web stuff **** in house - /ad hoc/ object storage (to avoid imposing tech to mirrors) - data model implementation, listers, loaders, scheduler - ~60 Git repositories (~20 Python packages, ~30 Puppet modules) - ~30 kSLOC Python / ~12 kSLOC SQL / ~4 kSLOC Puppet - licence choice: GPLv3 (backend) / AGPLv3 (frontend) *** Hardware stack **** in house - 2x hypervisors with ~20 VMs - 2x high density storage array (60 * 6TB => 300TB usable each) - Prototype: ceph storage cluster for blobs **** on Azure - full object storage mirror - full mirror of the database containing the graph - workers for content indexing - workers for download bundle preparation **** at the University of Bologna - backend storage (60TB) for the bundles available for download *** Software architecture :noexport: **** Module dependencies (internal + external) :B_picblock: :PROPERTIES: :BEAMER_env: picblock :BEAMER_opt: pic=swh-modules-deps-all,width=\linewidth :END: **** let's zoom in: http://deb.li/swhdeps ** Technology :noexport: :PROPERTIES: :CUSTOM_ID: technology-short :END: *** Deployment and resource usage **** Software - around 30k SLOC of custom Python code, running on Debian Stable - PostgreSQL database for the metadata storage **** Hardware - 3 hypervisors with mass storage and a backup server at Inria - Work in progress: in-house Ceph deployment for object storage - In-kind sponsorship of cloud and storage resources (Microsoft, University of Bologna) ** Software development :noexport: :PROPERTIES: :CUSTOM_ID: development :END: *** Software development **** classic FOSS development - language: English - development mailing list #+BEAMER: \\{\small \url{https://sympa.inria.fr/sympa/info/swh-devel}} - IRC #+BEAMER: \\ #swh-devel / FreeNode - Forge #+BEAMER: \\{\small \url{https://forge.softwareheritage.org}} - Git, tasks, code review, etc. **** for more information #+BEAMER: \scriptsize https://www.softwareheritage.org/community/developers/ ** Roadmap :PROPERTIES: :CUSTOM_ID: features :END: *** Features... - (done) *lookup* by content hash - (done) *browsing*: "wayback machine" for source code (API + UI) - (early access) *deposit* of source code bundles directly to the archive - (early access) *save code now*, on-demand archive - (done) *download*: =wget= / =git clone= from the archive - (todo) *provenance* lookup for all archived content - (todo) *full-text search* on all archived source code files #+BEAMER: \pause *** ... and much more than one could possibly imagine all the world's software development history at hand's reach! ** Web API :noexport: :PROPERTIES: :CUSTOM_ID: api :END: *** Web API :PROPERTIES: :CUSTOM_ID: apiintro :END: **** RESTful API to programmatically access the Software Heritage archive \\ *\url{https://archive.softwareheritage.org/api/}* **** Features - pointwise *browsing* of the archive - … snapshots → revisions → directories → contents … - full access to the *metadata* of archived objects - *crawling* information - /when have you last visited this Git repository I care about?/ - /where were its branches/tags pointing to at the time?/ # - derived information about archived contents (WIP) # - MIME type, programming language, license, etc. **** Endpoint index \url{https://archive.softwareheritage.org/api/1/} *** A tour of the Web API --- origins & visits :PROPERTIES: :CUSTOM_ID: apitourvisits :END: #+BEAMER: \footnotesize #+BEGIN_SRC GET https://archive.softwareheritage.org/api/1/origin/ \ git/url/https://github.com/hylang/hy { "id": 1, "origin_visits_url": "/api/1/origin/1/visits/", "type": "git", "url": "https://github.com/hylang/hy" } #+END_SRC #+BEAMER: \vfill #+BEGIN_SRC GET https://archive.softwareheritage.org/api/1/origin/ \ 1/visits/ [ ..., { "date": "2016-09-14T11:04:26.769266+00:00", "origin": 1, "origin_visit_url": "/api/1/origin/1/visit/13/", "status": "full", "visit": 13 }, ... ] #+END_SRC *** A tour of the Web API --- snapshots :PROPERTIES: :CUSTOM_ID: apitoursnapshots :END: #+BEAMER: \footnotesize #+BEGIN_SRC GET https://archive.softwareheritage.org/api/1/origin/ \ 1/visit/13/ { ..., "occurrences": { ..., "refs/heads/master": { "target": "b94211251...", "target_type": "revision", "target_url": "/api/1/revision/b94211251.../" }, "refs/tags/0.10.0": { "target": "7045404f3...", "target_type": "release", "target_url": "/api/1/release/7045404f3.../" }, ... }, "origin": 1, "origin_url": "/api/1/origin/1/", "status": "full", "visit": 13 } #+END_SRC *** A tour of the Web API --- releases :noexport: :PROPERTIES: :CUSTOM_ID: apitourreleases :END: #+BEAMER: \footnotesize #+BEGIN_SRC GET https://archive.softwareheritage.org/api/1/release/ \ 7045404f3d1c54e6473c71bbb716529fbad4be24/ { "author": { "email": "tag@pault.ag", "fullname": "Paul Tagliamonte ", "id": 96, "name": "Paul Tagliamonte" }, "date": "2014-04-10T23:01:28-04:00", "message": "0.10: The Oh f*ck it's PyCon release", "name": "0.10.0", "synthetic": false, "target": "6072557b6...", "target_type": "revision", "target_url": "/api/1/revision/6072557b6.../", ... } #+END_SRC *** A tour of the Web API --- revisions :PROPERTIES: :CUSTOM_ID: apitourrevisions :END: #+BEAMER: \footnotesize #+BEGIN_SRC GET https://archive.softwareheritage.org/api/1/revision/ \ 6072557b6c10cd9a21145781e26ad1f978ed14b9/ { "author": { "email": "tag@pault.ag", "fullname": "Paul Tagliamonte ", "id": 96, "name": "Paul Tagliamonte" }, "committer": { ... }, "date": "2014-04-10T23:01:11-04:00", "committer_date": "2014-04-10T23:01:11-04:00", "directory": "2df4cd84e...", "directory_url": "/api/1/directory/2df4cd84e.../", "history_url": "/api/1/revision/6072557b6.../log/", "merge": false, "message": "0.10: The Oh f*ck it's PyCon release", "parents": [ { "id": "10149f66e...", "url": "/api/1/revision/10149f66e.../" } ], ... } #+END_SRC *** A tour of the Web API --- contents :PROPERTIES: :CUSTOM_ID: apitourcontents :END: #+BEAMER: \footnotesize #+BEGIN_SRC GET https://archive.softwareheritage.org/api/1/content/ \ adc83b19e793491b1c6ea0fd8b46cd9f32e592fc/ { "data_url": "/api/1/content/sha1:adc83b19e.../raw/", "filetype_url": "/api/1/content/sha1:.../filetype/", "language_url": "/api/1/content/sha1:.../language/", "length": 1, "license_url": "/api/1/content/sha1:.../license/", "sha1": "adc83b19e...", "sha1_git": "8b1378917...", "sha256": "01ba4719c...", "status": "visible" } #+END_SRC #+BEAMER: \normalsize \vfill \pause **** Caveats - rate limits apply throughout the API - blob download available for selected contents ** Some technical challenges :PROPERTIES: :CUSTOM_ID: techchallenges :END: *** Expanding the archive - discover and classify /all/ the software sources - importers for other VCSs (SVN, Hg, ...) \hfill /We need your help!/ *** Staying current get new repositories and commits ASAP\\ \hfill /We need reliable, standardised event feeds./ *** Handling the backlog ingesting all the pre-existing data\\ \hfill /Decades of software development are waiting!/ diff --git a/talks-public/2018-10-17-trento-specktech/2018-10-17-trento-specktech.org b/talks-public/2018-10-17-trento-specktech/2018-10-17-trento-specktech.org new file mode 100644 index 0000000..c972e4b --- /dev/null +++ b/talks-public/2018-10-17-trento-specktech/2018-10-17-trento-specktech.org @@ -0,0 +1,141 @@ +#+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt) +#+TITLE: Software Heritage +#+SUBTITLE: Archiving the Free Software Commons for Fun & Profit +#+BEAMER_HEADER: \date[17/10/2018, PyConFr]{17 Oct 2018\\Speck \& Teck - Trento, Italy} +#+DATE: 17 October 2018 + +#+INCLUDE: "../../common/modules/prelude-toc.org" :minlevel 1 +#+INCLUDE: "../../common/modules/169.org" +#+BEAMER_HEADER: \institute[Software Heritage]{Software Heritage --- {\tt zack@upsilon.cc, @zacchiro}} +#+BEAMER_HEADER: \author{Stefano Zacchiroli} + +#+LATEX_HEADER_EXTRA: \usepackage{bbding} +#+LATEX_HEADER_EXTRA: \DeclareUnicodeCharacter{66D}{\FiveStar} +#+LATEX_HEADER_EXTRA: \usepackage{tikz} +#+LATEX_HEADER_EXTRA: \usetikzlibrary{arrows,shapes} +#+LATEX_HEADER_EXTRA: \definecolor{swh-orange}{RGB}{254,205,27} +#+LATEX_HEADER_EXTRA: \definecolor{swh-red}{RGB}{226,0,38} +#+LATEX_HEADER_EXTRA: \definecolor{swh-green}{RGB}{77,181,174} + +# Syntax highlighting setup + +#+LATEX_HEADER_EXTRA: \usepackage{minted} +#+LaTeX_HEADER_EXTRA: \usemintedstyle{tango} +#+LaTeX_HEADER_EXTRA: \newminted{python}{fontsize=\scriptsize} +#+LaTeX_HEADER_EXTRA: \newminted{html}{fontsize=\scriptsize} + +#+name: setup-minted +#+begin_src emacs-lisp :exports results :results silent + (setq org-latex-listings 'minted) + (setq org-latex-minted-options + '(("fontsize" "\\scriptsize") + ("linenos" ""))) + (setq org-latex-to-pdf-process + '("pdflatex -shell-escape -interaction nonstopmode -output-directory %o %f" + "pdflatex -shell-escape -interaction nonstopmode -output-directory %o %f" + "pdflatex -shell-escape -interaction nonstopmode -output-directory %o %f")) +#+end_src + +# End syntax highlighting setup + +* The Software Commons +** (Free) Software is everywhere + #+latex: \begin{center} + #+ATTR_LATEX: :width .75\linewidth + file:software-center.pdf + #+latex: \end{center} + #+INCLUDE: "../../common/modules/source-code-different-short.org::#softwareisdifferent" :minlevel 2 + +** Our Software Commons + #+INCLUDE: "../../common/modules/foss-commons.org::#commonsdef" :only-contents t + #+BEAMER: \pause +*** Source code is /a precious part/ of our commons + \hfill are we taking care of it? + #+INCLUDE: "../../common/modules/swh-motivations-foss.org::#main" :only-contents t :minlevel 2 + +* Software Heritage + #+INCLUDE: "../../common/modules/swh-overview-sourcecode.org::#mission" :minlevel 2 +** Core principles + #+latex: \begin{center} + #+ATTR_LATEX: :width .9\linewidth + file:SWH-as-foundation-slim.png + #+latex: \end{center} + #+BEAMER: \pause +*** Open approach :B_block:BMCOL: + :PROPERTIES: + :BEAMER_col: 0.4 + :BEAMER_env: block + :END: + - 100% Free Software + - transparency +*** In for the long haul :B_block:BMCOL: + :PROPERTIES: + :BEAMER_col: 0.4 + :BEAMER_env: block + :END: + - replication + - non profit + #+INCLUDE: "../../common/modules/status-extended.org::#archivinggoals" :minlevel 2 + #+INCLUDE: "../../common/modules/status-extended.org::#architecture" :minlevel 2 :only-contents t + #+INCLUDE: "../../common/modules/status-extended.org::#dagdetail" :minlevel 2 :only-contents t + #+INCLUDE: "../../common/modules/status-extended.org::#archive" :minlevel 2 + +# * Accessing the archive + # #+INCLUDE: "../../common/modules/status-extended.org::#api" :only-contents t + #+INCLUDE: "../../common/modules/status-extended.org::#apiintro" :minlevel 2 + #+INCLUDE: "../../common/modules/vault.org::#overview" :minlevel 2 + #+INCLUDE: "../../common/modules/webui.org::#intro" :minlevel 2 + +* Case study: archiving PyPI +** Latest addition to the archive + https://www.softwareheritage.org/2018/10/10/pypi-available-on-software-heritage/ +*** + :PROPERTIES: + :BEAMER_env: block + :BEAMER_COL: .4 + :END: + #+BEAMER: \includegraphics[width=\linewidth]{pypi-blog-post} +*** + :PROPERTIES: + :BEAMER_env: block + :BEAMER_COL: .4 + :END: + - reference package repositories for the Python community + - volunteer run + - 150 K packages + - 1.1 M releases + - 1.5 M files + + #+INCLUDE: "../../common/modules/pypi-loader.org::#main" :only-contents t :minlevel 2 +* Getting involved + #+INCLUDE: "../../common/modules/status-extended.org::#features" :minlevel 2 +** You can help! + #+BEAMER: \vspace{-3mm} +*** Coding + | ٭٭ | Web UI improvements | + | ٭٭٭ | loaders for unsupported VCS/package formats | + | ٭٭٭ | listers for unsupported forges/package managers | + #+BEAMER: \vspace{-2mm} \footnotesize \centering + \url{https://forge.softwareheritage.org} \\ + \url{https://docs.softwareheritage.org/devel} + #+BEAMER: \pause +*** Community + | ٭٭٭ | spread the world, help us with sustainability | + | ٭٭ | document endangered source code | + #+BEAMER: \vspace{-2mm} \footnotesize \centering + \url{https://wiki.softwareheritage.org/Suggestion_box} + #+BEAMER: \pause +*** Join us + #+BEAMER: \footnotesize \centering + - \url{https://www.softwareheritage.org/jobs} --- *job openings* + - \url{https://wiki.softwareheritage.org/Internship} --- *internships* +** Conclusion +*** Software Heritage is + - a reference archive of *all Free Software* ever written + - an international, open, nonprofit, *mutualized infrastructure* + - *now accessible* to developers, users, vendors + - at the service of our community, *at the service of society* +*** Come in, we're open! + \url{www.softwareheritage.org} --- general information \\ + \url{wiki.softwareheritage.org} --- internships, leads \\ + \url{forge.softwareheritage.org} --- our own code diff --git a/talks-public/2018-10-17-trento-specktech/Makefile b/talks-public/2018-10-17-trento-specktech/Makefile new file mode 100644 index 0000000..68fbee7 --- /dev/null +++ b/talks-public/2018-10-17-trento-specktech/Makefile @@ -0,0 +1 @@ +include ../Makefile.slides