diff --git a/common/images/pypi-blog-post.png b/common/images/pypi-blog-post.png
new file mode 100644
index 0000000..613bdd2
Binary files /dev/null and b/common/images/pypi-blog-post.png differ
diff --git a/common/modules/pypi-loader.org b/common/modules/pypi-loader.org
index 94ac3e0..91c3078 100644
--- a/common/modules/pypi-loader.org
+++ b/common/modules/pypi-loader.org
@@ -1,261 +1,259 @@
#+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt)
# Tech details about archive Python packages from PyPI
#+INCLUDE: "prelude.org" :minlevel 1
* PyPI loader
:PROPERTIES:
:CUSTOM_ID: main
:END:
-** Data flow redux
-
- #+BEAMER: \begin{center}\includegraphics[width=\extblockscale{1.2\textwidth}]{swh-dataflow.pdf}\end{center}
+# ** Data flow redux
+# #+BEAMER: \begin{center}\includegraphics[width=\extblockscale{1.2\textwidth}]{swh-dataflow.pdf}\end{center}
** Our focus
#+BEAMER: \begin{center}\includegraphics[width=\extblockscale{1.2\textwidth}]{swh-dataflow-pypi.pdf}\end{center}
-
** Listing all Python modules (1/3)
***
#+BEAMER: \footnotesize \centering
https://forge.softwareheritage.org/source/swh-lister/
*** What does a Software Heritage lister do?
- crawls and parses upstream list of project APIs
- generates origins (records that the project has been detected) and loading tasks
#+beamer: \pause
***
Credits go to Avi Kelman for the lister scaffolding, and to Antoine Dumont for the PyPI implementation
#+beamer: \pause
*** A visit of the Cheese Shop
- A little bit more efficiently than [[https://www.youtube.com/watch?v=B3KBuQHHKx0][John Cleese]]
- Uses https://pypi.org/simple/ (according to the warehouse docs, the only "package listing" API that's not on the way to deprecation)
** Listing all Python modules (2/3)
*** GET https://pypi.org/simple/
#+begin_src html
Simple index
0
0-._.-._.-._.-._.-._.-._.-0
[...]
Django
[...]
#+end_src
** Listing all Python modules (3/3)
***
#+begin_src python
# Origin specification
origin = {
'type': 'pypi',
'url': 'https://pypi.org/packages/Django/', # Canonical project URL
}
#+end_src
#+beamer: \pause
#+begin_src python
# Scheduler task specification
update_task = {
'type': 'origin-update-pypi',
'policy': 'recurring',
'next_run': datetime.now(tz=timezone.utc),
'arguments': {
'args': [
'Django', # Project name
'https://pypi.org/packages/Django/', # Origin URL
'https://pypi.org/pypi/Django/json', # Metadata URL
],
'kwargs': {},
},
'priority': None,
}
#+end_src
** Task scheduling (1/2)
***
#+BEAMER: \footnotesize \centering
https://forge.softwareheritage.org/source/swh-scheduler/
*** What does the Software Heritage scheduler do?
- Record **recurrent** and **one-shot** jobs in a database
- Schedules runs of these jobs, records their results
- Manages retries for transient job failures (remote service unavailable, ...)
- Manages adaptive intervals for recurrent jobs
** Task scheduling (2/2)
*** Builds upon trusted Python tools
- Celery is used as a task queuing middleware, and for its worker management framework
- Workers send task results through the Celery events mechanism
#+beamer: \pause
*** And makes them more useful to us
- The database is the single source of truth
- ~swh.scheduler.celery_backend.runner~ pulls tasks from the database into
Celery, limiting the RabbitMQ queue depth (allows task prioritization)
- ~swh.scheduler.celery_backend.listener~ fetches task results from Celery
events and updates the database
- Archival of elapsed tasks/runs/logs in elasticsearch to keep the database
snappy
** Loading Python packages (1/4)
*** What's a Python package anyway?
- Source distributions (~sdists~, currently tarballs or zips)
- Binary distributions (~bdists~, which are mostly wheels these days)
As we're interested in source code, Software Heritage looks at ~sdists~ exclusively
#+beamer: \pause
- The current sdist format is unspecified: you probably get a tarball, which
maybe contains a ~setup.py~ somewhere
- When building a sdist, distutils generates a machine-readable ~PKG-INFO~ file
is generated and puts in the tarball
#+beamer: \pause
*** The long wait for PEP 517 ("A build-system independent format for source trees")
- One uniform transport format: a gzipped tarball with one toplevel directory
- Machine parsable data about the project by default (~pyproject.toml~)
Hopefully soon in your nearest Cheese Shop (go help the folks in PyPA!)
** Loading Python packages (2/4)
***
#+BEAMER: \footnotesize \centering
https://forge.softwareheritage.org/source/swh-loader-pypi/
*** Common loading process
:PROPERTIES:
:BEAMER_col: 0.5
:BEAMER_env: block
:BEAMER_act: +-
:END:
Implemented in ~swh.loader.core~
- Fetch metadata about current versions
- Compare to latest loaded versions
- Download and process versions we had never seen
- Load new data
*** PyPI specifics
:PROPERTIES:
:BEAMER_col: 0.5
:BEAMER_env: block
:BEAMER_act: +-
:END:
Implemented in ~swh.loader.pypi~
- Comparison done using the ~sdist~ digests
- PKG-INFO metadata parsed and saved
- versions with multiple sdists imported separately
** Loading Python packages (3/4)
*** PyPI snapshots
#+begin_src python
pifpaf_snapshot = {
'id': b'\xc6_\xfe#\x94\xba\x81\xc3\x94\x9b\xeb[\x06\xf5JC\x0f\x19n\xa6',
'branches': {
b'releases/0.0.1': {
b'releases/0.0.2': {
...
b'releases/2.1.2': {
'target': b'\x8a\xcd\xf3l\xee\xe50\xe2\x81]\x08:5\xd9_\xd6\xeff\xc9\xa3',
'target_type': 'revision',
},
b'releases/2.1.2.dev7': {
'target': b'hGh\x15h|\xf3\xd2v\xf8\xec-\xa7\xfeuB\xda3\x83x',
'target_type': 'revision',
},
b'HEAD': {
'target': b'releases/2.1.2',
'target_type': 'alias',
},
},
}
#+end_src
** Loading Python packages (4/4)
*** PyPI revisions
#+begin_src python -i
pifpaf_revision = {
'id': b'\x8a\xcd\xf3l\xee\xe50\xe2\x81]\x08:5\xd9_\xd6\xeff\xc9\xa3',
'author': {
'name': b'Julien Danjou',
...
},
'date': {
'timestamp': {'seconds': 1538577319, 'microseconds': 0},
},
...
'type': 'tar',
'directory': b'\xa4\xf2\xad\xb1\xef\r\xcf\x894::@=\xf9R\x86=\x19"\\',
'message': b'2.1.2',
#+end_src
#+beamer: \pause
#+begin_src python -i
'metadata': {
'project': { # Metadata parsed from PKG-INFO
'name': 'pifpaf',
'author': 'Julien Danjou',
'license': None,
'summary': 'Suite of tools and fixtures to manage daemons for testing',
'version': '2.1.2',
...
#+end_src
**
***
#+begin_src python -i
'classifiers': [
'Intended Audience :: Information Technology',
...
],
...
},
#+end_src
#+beamer: \pause
#+begin_src python -i
'original_artifact': { # The original tarball we downloaded
'url': 'https://files.pythonhosted.org/packages/cc/ce/2599[...]',
'date': '2018-10-03T14:35:19',
'sha1': '00c4efc47580b5c4ad1dcdb5118159f9b057b0fd',
'size': 192940,
'sha256': 'a6eef2ae56ac90d02df5f45885973e108c960a2ea113cc76[...]',
'filename': 'pifpaf-2.1.2.tar.gz',
'sha1_git': '8ce7e3ddda336dd9edff26ae8efaf4b81439c42c',
'blake2s256': 'c4f7fcd4324715f4bfb54f8eefb10fde803efb7a02e2[...]',
'archive_type': 'tar',
},
},
'synthetic': True,
'parents': [],
}
#+end_src
diff --git a/common/modules/status-extended.org b/common/modules/status-extended.org
index c33e497..ff50bf4 100644
--- a/common/modules/status-extended.org
+++ b/common/modules/status-extended.org
@@ -1,433 +1,433 @@
#+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt)
#+INCLUDE: "prelude.org" :minlevel 1
# not to be included as a whole, just pick individual slides as you see fit
* Status
:PROPERTIES:
:CUSTOM_ID: main
:END:
** The people
:PROPERTIES:
:CUSTOM_ID: people
:END:
*** The core team :B_picblock:
:PROPERTIES:
:CUSTOM_ID: core-team-formal
:BEAMER_env: picblock
:BEAMER_opt: pic=team,width=.4\linewidth
:END:
- Roberto Di Cosmo
- Stefano Zacchiroli
- Nicolas Dandrimont (Engineer)
- Antoine Dumont (Engineer)
# - and /Jordi, Quentin and Guillaume/
*** Scientific advisors
- Serge Abiteboul (French Science Academy)
- Jean-François Abramatic (former W3C director)
- Gerard Berry (CNRS Gold Medal, French Science Academy)
- Julia Lawall (Coccinelle, Linux Kernel, Outreachy)
** Archive coverage
:PROPERTIES:
:CUSTOM_ID: archive
:END:
#+BEAMER: \vspace{-2mm}
#+BEAMER: \begin{center}\includegraphics[width=\extblockscale{1.1\linewidth}]{2018-10-archive-growth.png}\end{center}
#+BEAMER: \vspace{-2mm}
*** Current sources
- - live: GitHub, Debian, GitLab.com, **PyPI**
+ - live: GitHub, Debian, GitLab.com, PyPI
- one-off: Gitorious, Google Code, GNU
- WIP: Bitbucket
#+BEAMER: \pause
***
175 TB (compressed) blobs, 6 TB database (as a graph: 10 B nodes + 100 B edges)
#+BEAMER: \pause
***
\hfill The /richest/ public source code archive, ... and growing daily!
** The structure of the archive :noexport:
*** On-disk storage
- flat file storage for contents
- postgres database for the metadata
*** Data model: /one/ big Merkle DAG, inspired by the git model
- Origins (= repositories)
- Occurrences (= branches)
- Releases (= tags)
- Revisions (= commits)
- Directories (= trees)
- Contents (= blobs)
** Archiving goals
:PROPERTIES:
:CUSTOM_ID: archivinggoals
:END:
Targets: VCS repositories & source code releases (e.g., tarballs)
*** We DO archive
- file *content* (= blobs)
- *revisions* (= commits), with full metadata
- *releases* (= tags), ditto
- where (*origin*) & when (*visit*) we found any of the above
# - time-indexed repo *snapshots* (i.e., we never delete anything)
… in a VCS-/archive-agnostic *canonical data model*
*** We DON'T archive
# - diffs → derived data from related contents
- homepages, wikis
- BTS/issues/code reviews/etc.
- mailing lists
Long term vision: play our part in a /"semantic wikipedia of software"/
** Architecture
:PROPERTIES:
:CUSTOM_ID: architecture
:END:
*** Data flow
:PROPERTIES:
:CUSTOM_ID: dataflow
:END:
#
#+BEAMER: \begin{center}\includegraphics[width=\extblockscale{1.2\textwidth}]{swh-dataflow.pdf}\end{center}
** Data model :noexport:
*** General schema
- VCS-independent
- fully deduplicated
+ files, directories and commits are /shared/
- biggest git-like /graph/ in the world
***
\begin{center}
\url{http://deb.li/swhdm}
\end{center}
*** full hash index (sha1, sha256, ...)
Some funny facts:
- the GPL2 licence appears under more than 500 names
+ including /aa.css.txt/ and /FullSync.txt/ ~ :-)
** Merkle DAG
*** Merkle structure
:PROPERTIES:
:CUSTOM_ID: merkle
:END:
**** Merkle trees
:PROPERTIES:
:CUSTOM_ID: merkletree
:END:
# R. C. Merkle, A digital signature based on a conventional encryption
# function, Crypto '87
#+BEAMER: \vspace{-3mm}
***** Merkle tree (R. C. Merkle, Crypto 1979) :B_picblock:
:PROPERTIES:
:BEAMER_opt: pic=merkle, leftpic=true, width=.7\linewidth
:BEAMER_env: picblock
:BEAMER_act:
:END:
Combination of
- tree
- hash function
#+BEAMER: \pause
#+BEAMER: \footnotesize
***** Classical cryptographic construction
- fast, parallel signature of large data structures
- widely used (e.g., Git, blockchains, IPFS, ...)
- built-in deduplication
**** The archive in a few pictures
:PROPERTIES:
:CUSTOM_ID: merkledemo
:END:
***** A giant (extended) Merkle DAG
#+LATEX: \only<1>{\colorbox{white}{\includegraphics[width=\extblockscale{.9\linewidth}]{git-merkle/merkle_1.pdf}}}
#+LATEX: \only<2>{\colorbox{white}{\includegraphics[width=\extblockscale{.9\linewidth}]{git-merkle/contents.pdf}}}
#+LATEX: \only<3>{\colorbox{white}{\includegraphics[width=\extblockscale{.9\linewidth}]{git-merkle/merkle_2_contents.pdf}}}
#+LATEX: \only<4>{\colorbox{white}{\includegraphics[width=\extblockscale{.9\linewidth}]{git-merkle/directories.pdf}}}
#+LATEX: \only<5>{\colorbox{white}{\includegraphics[width=\extblockscale{.9\linewidth}]{git-merkle/merkle_3_directories.pdf}}}
#+LATEX: \only<6>{\colorbox{white}{\includegraphics[width=\extblockscale{.9\linewidth}]{git-merkle/revisions.pdf}}}
#+LATEX: \only<7>{\colorbox{white}{\includegraphics[width=\extblockscale{.9\linewidth}]{git-merkle/merkle_4_revisions.pdf}}}
#+LATEX: \only<8>{\colorbox{white}{\includegraphics[width=\extblockscale{.9\linewidth}]{git-merkle/releases.pdf}}}
#+LATEX: \only<9>{\colorbox{white}{\includegraphics[width=\extblockscale{.9\linewidth}]{git-merkle/merkle_5_releases.pdf}}}
# #+LATEX: {\colorbox{white}{\includegraphics[width=\extblockscale{.9\linewidth}]{git-merkle/merkle_1.pdf}}}
*** A revision node
:PROPERTIES:
:CUSTOM_ID: merklerevision
:END:
**** Example: a Software Heritage revision
*****
#+BEAMER: \vspace{-.5cm}\centering\includegraphics[width=0.9\textwidth]{git-merkle/revisions}
*****
Note: most object kinds currently have Git-compatible identifiers
*** Giant DAG
:PROPERTIES:
:CUSTOM_ID: giantdag
:END:
**** The archive: a (giant) Merkle DAG
# Using an empty frame because the image is difficult to read on swh bg.
# Finding a way to override image bg for just this frame would be better.
*****
#+BEAMER: \centering \includegraphics[width=\extblockscale{\textwidth}]{git-merkle/merkle_5_releases}
*** Giant DAG (single slide)
:PROPERTIES:
:CUSTOM_ID: giantdag1slide
:END:
**** The Software Heritage archive: a gigantic Merkle DAG
#+LATEX: \centering\forcebeamerstart{}
#+LATEX: \only<1>{\colorbox{white}{\includegraphics[width=.75\linewidth]{git-merkle/merkle_1}}}
#+LATEX: \only<2>{\colorbox{white}{\includegraphics[width=.75\linewidth]{git-merkle/contents}}}
#+LATEX: \only<3>{\colorbox{white}{\includegraphics[width=.75\linewidth]{git-merkle/merkle_2_contents}}}
#+LATEX: \only<4>{\colorbox{white}{\includegraphics[width=.75\linewidth]{git-merkle/directories}}}
#+LATEX: \only<5>{\colorbox{white}{\includegraphics[width=.75\linewidth]{git-merkle/merkle_3_directories}}}
#+LATEX: \only<6>{\colorbox{white}{\includegraphics[width=.75\linewidth]{git-merkle/revisions}}}
#+LATEX: \only<7>{\colorbox{white}{\includegraphics[width=.75\linewidth]{git-merkle/merkle_4_revisions}}}
#+LATEX: \only<8>{\colorbox{white}{\includegraphics[width=.75\linewidth]{git-merkle/releases}}}
#+LATEX: \only<9>{\colorbox{white}{\includegraphics[width=.75\linewidth]{git-merkle/merkle_5_releases}}}
#+LATEX: \forcebeamerend{}
*** Giant DAG (detailed)
:PROPERTIES:
:CUSTOM_ID: dagdetail
:END:
**** The archive: a (giant) Merkle DAG
#+BEAMER: \vspace{-3mm}
#+BEAMER: \centering \includegraphics[width=\textwidth]{swh-merkle-dag-wide}
** Technology :noexport:
:PROPERTIES:
:CUSTOM_ID: technology
:END:
*** Software stack
**** 3rd party
- Debian, Puppet, Ceph
- PostgreSQL for metadata storage, with barman & pglogical
- Celery (RabbitMQ backend) for task scheduling
- Python3 and psycopg2 for the backend
- Django, Bootstrap, D3.js for Web stuff
**** in house
- /ad hoc/ object storage (to avoid imposing tech to mirrors)
- data model implementation, listers, loaders, scheduler
- ~60 Git repositories (~20 Python packages, ~30 Puppet modules)
- ~30 kSLOC Python / ~12 kSLOC SQL / ~4 kSLOC Puppet
- licence choice: GPLv3 (backend) / AGPLv3 (frontend)
*** Hardware stack
**** in house
- 2x hypervisors with ~20 VMs
- 2x high density storage array (60 * 6TB => 300TB usable each)
- Prototype: ceph storage cluster for blobs
**** on Azure
- full object storage mirror
- full mirror of the database containing the graph
- workers for content indexing
- workers for download bundle preparation
**** at the University of Bologna
- backend storage (60TB) for the bundles available for download
*** Software architecture :noexport:
**** Module dependencies (internal + external) :B_picblock:
:PROPERTIES:
:BEAMER_env: picblock
:BEAMER_opt: pic=swh-modules-deps-all,width=\linewidth
:END:
****
let's zoom in: http://deb.li/swhdeps
** Technology :noexport:
:PROPERTIES:
:CUSTOM_ID: technology-short
:END:
*** Deployment and resource usage
**** Software
- around 30k SLOC of custom Python code, running on Debian Stable
- PostgreSQL database for the metadata storage
**** Hardware
- 3 hypervisors with mass storage and a backup server at Inria
- Work in progress: in-house Ceph deployment for object storage
- In-kind sponsorship of cloud and storage resources (Microsoft, University of Bologna)
** Software development :noexport:
:PROPERTIES:
:CUSTOM_ID: development
:END:
*** Software development
**** classic FOSS development
- language: English
- development mailing list
#+BEAMER: \\{\small \url{https://sympa.inria.fr/sympa/info/swh-devel}}
- IRC
#+BEAMER: \\
#swh-devel / FreeNode
- Forge
#+BEAMER: \\{\small \url{https://forge.softwareheritage.org}}
- Git, tasks, code review, etc.
**** for more information
#+BEAMER: \scriptsize
https://www.softwareheritage.org/community/developers/
** Roadmap
:PROPERTIES:
:CUSTOM_ID: features
:END:
*** Features...
- (done) *lookup* by content hash
- (done) *browsing*: "wayback machine" for source code (API + UI)
- (early access) *deposit* of source code bundles directly to the archive
- (early access) *save code now*, on-demand archive
- (done) *download*: =wget= / =git clone= from the archive
- (todo) *provenance* lookup for all archived content
- (todo) *full-text search* on all archived source code files
#+BEAMER: \pause
*** ... and much more than one could possibly imagine
all the world's software development history at hand's reach!
** Web API :noexport:
:PROPERTIES:
:CUSTOM_ID: api
:END:
*** Web API
:PROPERTIES:
:CUSTOM_ID: apiintro
:END:
****
RESTful API to programmatically access the Software Heritage archive \\
*\url{https://archive.softwareheritage.org/api/}*
**** Features
- pointwise *browsing* of the archive
- … snapshots → revisions → directories → contents …
- full access to the *metadata* of archived objects
- *crawling* information
- /when have you last visited this Git repository I care about?/
- /where were its branches/tags pointing to at the time?/
# - derived information about archived contents (WIP)
# - MIME type, programming language, license, etc.
**** Endpoint index
\url{https://archive.softwareheritage.org/api/1/}
*** A tour of the Web API --- origins & visits
:PROPERTIES:
:CUSTOM_ID: apitourvisits
:END:
#+BEAMER: \footnotesize
#+BEGIN_SRC
GET https://archive.softwareheritage.org/api/1/origin/ \
git/url/https://github.com/hylang/hy
{ "id": 1,
"origin_visits_url": "/api/1/origin/1/visits/",
"type": "git",
"url": "https://github.com/hylang/hy"
}
#+END_SRC
#+BEAMER: \vfill
#+BEGIN_SRC
GET https://archive.softwareheritage.org/api/1/origin/ \
1/visits/
[ ...,
{ "date": "2016-09-14T11:04:26.769266+00:00",
"origin": 1,
"origin_visit_url": "/api/1/origin/1/visit/13/",
"status": "full",
"visit": 13
}, ...
]
#+END_SRC
*** A tour of the Web API --- snapshots
:PROPERTIES:
:CUSTOM_ID: apitoursnapshots
:END:
#+BEAMER: \footnotesize
#+BEGIN_SRC
GET https://archive.softwareheritage.org/api/1/origin/ \
1/visit/13/
{ ...,
"occurrences": { ...,
"refs/heads/master": {
"target": "b94211251...",
"target_type": "revision",
"target_url": "/api/1/revision/b94211251.../"
},
"refs/tags/0.10.0": {
"target": "7045404f3...",
"target_type": "release",
"target_url": "/api/1/release/7045404f3.../"
}, ...
},
"origin": 1,
"origin_url": "/api/1/origin/1/",
"status": "full",
"visit": 13
}
#+END_SRC
*** A tour of the Web API --- releases :noexport:
:PROPERTIES:
:CUSTOM_ID: apitourreleases
:END:
#+BEAMER: \footnotesize
#+BEGIN_SRC
GET https://archive.softwareheritage.org/api/1/release/ \
7045404f3d1c54e6473c71bbb716529fbad4be24/
{
"author": {
"email": "tag@pault.ag",
"fullname": "Paul Tagliamonte ",
"id": 96,
"name": "Paul Tagliamonte"
},
"date": "2014-04-10T23:01:28-04:00",
"message": "0.10: The Oh f*ck it's PyCon release",
"name": "0.10.0",
"synthetic": false,
"target": "6072557b6...",
"target_type": "revision",
"target_url": "/api/1/revision/6072557b6.../",
...
}
#+END_SRC
*** A tour of the Web API --- revisions
:PROPERTIES:
:CUSTOM_ID: apitourrevisions
:END:
#+BEAMER: \footnotesize
#+BEGIN_SRC
GET https://archive.softwareheritage.org/api/1/revision/ \
6072557b6c10cd9a21145781e26ad1f978ed14b9/
{
"author": {
"email": "tag@pault.ag",
"fullname": "Paul Tagliamonte ",
"id": 96,
"name": "Paul Tagliamonte"
},
"committer": { ... },
"date": "2014-04-10T23:01:11-04:00",
"committer_date": "2014-04-10T23:01:11-04:00",
"directory": "2df4cd84e...",
"directory_url": "/api/1/directory/2df4cd84e.../",
"history_url": "/api/1/revision/6072557b6.../log/",
"merge": false,
"message": "0.10: The Oh f*ck it's PyCon release",
"parents": [ {
"id": "10149f66e...",
"url": "/api/1/revision/10149f66e.../"
} ],
...
}
#+END_SRC
*** A tour of the Web API --- contents
:PROPERTIES:
:CUSTOM_ID: apitourcontents
:END:
#+BEAMER: \footnotesize
#+BEGIN_SRC
GET https://archive.softwareheritage.org/api/1/content/ \
adc83b19e793491b1c6ea0fd8b46cd9f32e592fc/
{
"data_url": "/api/1/content/sha1:adc83b19e.../raw/",
"filetype_url": "/api/1/content/sha1:.../filetype/",
"language_url": "/api/1/content/sha1:.../language/",
"length": 1,
"license_url": "/api/1/content/sha1:.../license/",
"sha1": "adc83b19e...",
"sha1_git": "8b1378917...",
"sha256": "01ba4719c...",
"status": "visible"
}
#+END_SRC
#+BEAMER: \normalsize \vfill \pause
**** Caveats
- rate limits apply throughout the API
- blob download available for selected contents
** Some technical challenges
:PROPERTIES:
:CUSTOM_ID: techchallenges
:END:
*** Expanding the archive
- discover and classify /all/ the software sources
- importers for other VCSs (SVN, Hg, ...)
\hfill /We need your help!/
*** Staying current
get new repositories and commits ASAP\\
\hfill /We need reliable, standardised event feeds./
*** Handling the backlog
ingesting all the pre-existing data\\
\hfill /Decades of software development are waiting!/
diff --git a/talks-public/2018-10-17-trento-specktech/2018-10-17-trento-specktech.org b/talks-public/2018-10-17-trento-specktech/2018-10-17-trento-specktech.org
new file mode 100644
index 0000000..c972e4b
--- /dev/null
+++ b/talks-public/2018-10-17-trento-specktech/2018-10-17-trento-specktech.org
@@ -0,0 +1,141 @@
+#+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt)
+#+TITLE: Software Heritage
+#+SUBTITLE: Archiving the Free Software Commons for Fun & Profit
+#+BEAMER_HEADER: \date[17/10/2018, PyConFr]{17 Oct 2018\\Speck \& Teck - Trento, Italy}
+#+DATE: 17 October 2018
+
+#+INCLUDE: "../../common/modules/prelude-toc.org" :minlevel 1
+#+INCLUDE: "../../common/modules/169.org"
+#+BEAMER_HEADER: \institute[Software Heritage]{Software Heritage --- {\tt zack@upsilon.cc, @zacchiro}}
+#+BEAMER_HEADER: \author{Stefano Zacchiroli}
+
+#+LATEX_HEADER_EXTRA: \usepackage{bbding}
+#+LATEX_HEADER_EXTRA: \DeclareUnicodeCharacter{66D}{\FiveStar}
+#+LATEX_HEADER_EXTRA: \usepackage{tikz}
+#+LATEX_HEADER_EXTRA: \usetikzlibrary{arrows,shapes}
+#+LATEX_HEADER_EXTRA: \definecolor{swh-orange}{RGB}{254,205,27}
+#+LATEX_HEADER_EXTRA: \definecolor{swh-red}{RGB}{226,0,38}
+#+LATEX_HEADER_EXTRA: \definecolor{swh-green}{RGB}{77,181,174}
+
+# Syntax highlighting setup
+
+#+LATEX_HEADER_EXTRA: \usepackage{minted}
+#+LaTeX_HEADER_EXTRA: \usemintedstyle{tango}
+#+LaTeX_HEADER_EXTRA: \newminted{python}{fontsize=\scriptsize}
+#+LaTeX_HEADER_EXTRA: \newminted{html}{fontsize=\scriptsize}
+
+#+name: setup-minted
+#+begin_src emacs-lisp :exports results :results silent
+ (setq org-latex-listings 'minted)
+ (setq org-latex-minted-options
+ '(("fontsize" "\\scriptsize")
+ ("linenos" "")))
+ (setq org-latex-to-pdf-process
+ '("pdflatex -shell-escape -interaction nonstopmode -output-directory %o %f"
+ "pdflatex -shell-escape -interaction nonstopmode -output-directory %o %f"
+ "pdflatex -shell-escape -interaction nonstopmode -output-directory %o %f"))
+#+end_src
+
+# End syntax highlighting setup
+
+* The Software Commons
+** (Free) Software is everywhere
+ #+latex: \begin{center}
+ #+ATTR_LATEX: :width .75\linewidth
+ file:software-center.pdf
+ #+latex: \end{center}
+ #+INCLUDE: "../../common/modules/source-code-different-short.org::#softwareisdifferent" :minlevel 2
+
+** Our Software Commons
+ #+INCLUDE: "../../common/modules/foss-commons.org::#commonsdef" :only-contents t
+ #+BEAMER: \pause
+*** Source code is /a precious part/ of our commons
+ \hfill are we taking care of it?
+ #+INCLUDE: "../../common/modules/swh-motivations-foss.org::#main" :only-contents t :minlevel 2
+
+* Software Heritage
+ #+INCLUDE: "../../common/modules/swh-overview-sourcecode.org::#mission" :minlevel 2
+** Core principles
+ #+latex: \begin{center}
+ #+ATTR_LATEX: :width .9\linewidth
+ file:SWH-as-foundation-slim.png
+ #+latex: \end{center}
+ #+BEAMER: \pause
+*** Open approach :B_block:BMCOL:
+ :PROPERTIES:
+ :BEAMER_col: 0.4
+ :BEAMER_env: block
+ :END:
+ - 100% Free Software
+ - transparency
+*** In for the long haul :B_block:BMCOL:
+ :PROPERTIES:
+ :BEAMER_col: 0.4
+ :BEAMER_env: block
+ :END:
+ - replication
+ - non profit
+ #+INCLUDE: "../../common/modules/status-extended.org::#archivinggoals" :minlevel 2
+ #+INCLUDE: "../../common/modules/status-extended.org::#architecture" :minlevel 2 :only-contents t
+ #+INCLUDE: "../../common/modules/status-extended.org::#dagdetail" :minlevel 2 :only-contents t
+ #+INCLUDE: "../../common/modules/status-extended.org::#archive" :minlevel 2
+
+# * Accessing the archive
+ # #+INCLUDE: "../../common/modules/status-extended.org::#api" :only-contents t
+ #+INCLUDE: "../../common/modules/status-extended.org::#apiintro" :minlevel 2
+ #+INCLUDE: "../../common/modules/vault.org::#overview" :minlevel 2
+ #+INCLUDE: "../../common/modules/webui.org::#intro" :minlevel 2
+
+* Case study: archiving PyPI
+** Latest addition to the archive
+ https://www.softwareheritage.org/2018/10/10/pypi-available-on-software-heritage/
+***
+ :PROPERTIES:
+ :BEAMER_env: block
+ :BEAMER_COL: .4
+ :END:
+ #+BEAMER: \includegraphics[width=\linewidth]{pypi-blog-post}
+***
+ :PROPERTIES:
+ :BEAMER_env: block
+ :BEAMER_COL: .4
+ :END:
+ - reference package repositories for the Python community
+ - volunteer run
+ - 150 K packages
+ - 1.1 M releases
+ - 1.5 M files
+
+ #+INCLUDE: "../../common/modules/pypi-loader.org::#main" :only-contents t :minlevel 2
+* Getting involved
+ #+INCLUDE: "../../common/modules/status-extended.org::#features" :minlevel 2
+** You can help!
+ #+BEAMER: \vspace{-3mm}
+*** Coding
+ | ٭٭ | Web UI improvements |
+ | ٭٭٭ | loaders for unsupported VCS/package formats |
+ | ٭٭٭ | listers for unsupported forges/package managers |
+ #+BEAMER: \vspace{-2mm} \footnotesize \centering
+ \url{https://forge.softwareheritage.org} \\
+ \url{https://docs.softwareheritage.org/devel}
+ #+BEAMER: \pause
+*** Community
+ | ٭٭٭ | spread the world, help us with sustainability |
+ | ٭٭ | document endangered source code |
+ #+BEAMER: \vspace{-2mm} \footnotesize \centering
+ \url{https://wiki.softwareheritage.org/Suggestion_box}
+ #+BEAMER: \pause
+*** Join us
+ #+BEAMER: \footnotesize \centering
+ - \url{https://www.softwareheritage.org/jobs} --- *job openings*
+ - \url{https://wiki.softwareheritage.org/Internship} --- *internships*
+** Conclusion
+*** Software Heritage is
+ - a reference archive of *all Free Software* ever written
+ - an international, open, nonprofit, *mutualized infrastructure*
+ - *now accessible* to developers, users, vendors
+ - at the service of our community, *at the service of society*
+*** Come in, we're open!
+ \url{www.softwareheritage.org} --- general information \\
+ \url{wiki.softwareheritage.org} --- internships, leads \\
+ \url{forge.softwareheritage.org} --- our own code
diff --git a/talks-public/2018-10-17-trento-specktech/Makefile b/talks-public/2018-10-17-trento-specktech/Makefile
new file mode 100644
index 0000000..68fbee7
--- /dev/null
+++ b/talks-public/2018-10-17-trento-specktech/Makefile
@@ -0,0 +1 @@
+include ../Makefile.slides