Page MenuHomeSoftware Heritage

No OneTemporary

diff --git a/docs/index.rst b/docs/index.rst
index 9cd0c72..165ac49 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -1,211 +1,217 @@
.. _swh-docs:
Software Heritage - Development Documentation
=============================================
Getting started
---------------
* :ref:`getting-started` → deploy a local copy of the Software Heritage
software stack in less than 5 minutes, or
* :ref:`developer-setup` → get a working development setup that allows to hack
on the Software Heritage software stack
Contributing
------------
* :ref:`patch-submission` → learn how to submit your patches to the
Software Heritage codebase
* :ref:`code-review` → rules and guidelines to review code in
Software Heritage
* :ref:`python-style-guide` → how to format the Python code you write
Architecture
------------
* :ref:`architecture` → get a glimpse of the Software Heritage software
architecture
* :ref:`mirror` → learn what a Software Heritage mirror is and how to set up
one
* :ref:`keycloak <keycloak>` → learn what keycloak is and how to register and configure users
Data Model and Specifications
-----------------------------
* :ref:`persistent-identifiers` Specifications of the SoftWare Heritage persistent IDentifiers (SWHID).
* :ref:`data-model` Documentation of the main |swh| archive data model.
* :ref:`journal-specs` Documentation of the Kafka journal of the |swh| archive.
Tutorials
---------
* :ref:`testing-guide`
* :doc:`/tutorials/issue-debugging-monitoring`
* :ref:`Listing the content of your favorite forge <lister-tutorial>`
and :ref:`running a lister in Docker <run-lister-tutorial>`
+Roadmap
+-------
+
+* :ref:`roadmap-2021`
+
Components
----------
Here is brief overview of the most relevant software components in the Software
Heritage stack. Each component name is linked to the development documentation
of the corresponding Python module.
:ref:`swh.auth <swh-auth>`
low-level library used by modules needing keycloak authentication
:ref:`swh.core <swh-core>`
low-level utilities and helpers used by almost all other modules in the
stack
:ref:`swh.counters <swh-counters>`
service providing efficient estimates of the number of objects in the SWH archive,
using Redis's Hyperloglog
:ref:`swh.dataset <swh-dataset>`
public datasets and periodic data dumps of the archive released by Software
Heritage
:ref:`swh.deposit <swh-deposit>`
push-based deposit of software artifacts to the archive
swh.docs
developer documentation (used to generate this doc you are reading)
:ref:`swh.fuse <swh-fuse>`
Virtual file system to browse the Software Heritage archive, based on
`FUSE <https://github.com/libfuse/libfuse>`_
:ref:`swh.graph <swh-graph>`
Fast, compressed, in-memory representation of the archive, with tooling to
generate and query it.
:ref:`swh.indexer <swh-indexer>`
tools and workers used to crawl the content of the archive and extract
derived information from any artifact stored in it
:ref:`swh.journal <swh-journal>`
persistent logger of changes to the archive, with publish-subscribe support
:ref:`swh.lister <swh-lister>`
collection of listers for all sorts of source code hosting and distribution
places (forges, distributions, package managers, etc.)
:ref:`swh.loader-core <swh-loader-core>`
low-level loading utilities and helpers used by all other loaders
:ref:`swh.loader-git <swh-loader-git>`
loader for `Git <https://git-scm.com/>`_ repositories
:ref:`swh.loader-mercurial <swh-loader-mercurial>`
loader for `Mercurial <https://www.mercurial-scm.org/>`_ repositories
:ref:`swh.loader-svn <swh-loader-svn>`
loader for `Subversion <https://subversion.apache.org/>`_ repositories
:ref:`swh.model <swh-model>`
implementation of the :ref:`data-model` to archive source code artifacts
:ref:`swh.objstorage <swh-objstorage>`
content-addressable object storage
:ref:`swh.objstorage.replayer <swh-objstorage-replayer>`
Object storage replication tool
:ref:`swh.scanner <swh-scanner>`
source code scanner to analyze code bases and compare them with source code
artifacts archived by Software Heritage
:ref:`swh.scheduler <swh-scheduler>`
task manager for asynchronous/delayed tasks, used for recurrent (e.g.,
listing a forge, loading new stuff from a Git repository) and one-off
activities (e.g., loading a specific version of a source package)
:ref:`swh.search <swh-search>`
search engine for the archive
:ref:`swh.storage <swh-storage>`
abstraction layer over the archive, allowing to access all stored source
code artifacts as well as their metadata
:ref:`swh.vault <swh-vault>`
implementation of the vault service, allowing to retrieve parts of the
archive as self-contained bundles (e.g., individual releases, entire
repository snapshots, etc.)
:ref:`swh.web <swh-web>`
Web application(s) to browse the archive, for both interactive (HTML UI)
and mechanized (REST API) use
:ref:`swh.web.client <swh-web-client>`
Python client for :ref:`swh.web <swh-web>`
Dependencies
------------
The dependency relationships among the various modules are depicted below.
.. _py-deps-swh:
.. figure:: images/py-deps-swh.svg
:width: 1024px
:align: center
Dependencies among top-level Python modules (click to zoom).
Archive
-------
* :ref:`Archive ChangeLog <archive-changelog>`: notable changes to the archive
over time
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* `URLs index <http-routingtable.html>`_
* :ref:`search`
* :ref:`glossary`
.. ensure sphinx does not complain about index files not being included
.. toctree::
:maxdepth: 2
:caption: Contents:
:titlesonly:
:hidden:
getting-started-index
architecture/index
contributing/index
tutorials/index
API documentation <apidoc/modules>
+ roadmap/roadmap-2021.rst
swh.auth <swh-auth/index>
swh.core <swh-core/index>
swh.counters <swh-counters/index>
swh.dataset <swh-dataset/index>
swh.deposit <swh-deposit/index>
swh.fuse <swh-fuse/index>
swh.graph <swh-graph/index>
swh.indexer <swh-indexer/index>
swh.journal <swh-journal/index>
swh.lister <swh-lister/index>
swh.loader <swh-loader>
swh.model <swh-model/index>
swh.objstorage <swh-objstorage/index>
swh.objstorage.replayer <swh-objstorage-replayer/index>
swh.scanner <swh-scanner/index>
swh.scheduler <swh-scheduler/index>
swh.search <swh-search/index>
swh.storage <swh-storage/index>
swh.vault <swh-vault/index>
swh.web <swh-web/index>
swh.web.client <swh-web-client/index>
archive-changelog
journal
diff --git a/docs/roadmap/roadmap-2021.rst b/docs/roadmap/roadmap-2021.rst
new file mode 100644
index 0000000..d7c5a67
--- /dev/null
+++ b/docs/roadmap/roadmap-2021.rst
@@ -0,0 +1,384 @@
+.. _roadmap-2021:
+
+Roadmap 2021
+============
+
+(Version 1.0, last modified 5/5/2021)
+
+This document provides an overview of the technical roadmap of Software Heritage for
+2021.
+
+The `Kanban board <https://forge.softwareheritage.org/project/board/160/query/all/>`_
+is seen through our forge.
+
+
+.. contents::
+ :depth: 3
+..
+
+
+Collect
+-------
+
+Faster and more reliable save code now
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: openscience
+- task: `T3082 <https://forge.softwareheritage.org/T3082>`_
+- lead: ardumont
+- effort: 1 PM
+
+Includes work:
+
+- set up dedicated fast track pipeline for save code now
+- improve save code now monitoring (user and admin)
+
+Improve deposit integration, management and display
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: openscience
+- task: `T3128 <https://forge.softwareheritage.org/T3128>`_
+- lead: moranegg
+- effort: 3 PM
+
+Includes work:
+
+- full invenioRDM integration `T2344 <https://forge.softwareheritage.org/T2344>`_
+- metadata only deposit `T2540 <https://forge.softwareheritage.org/T2540>`_
+
+Save forge now
+^^^^^^^^^^^^^^
+
+- tags: expand
+- task: `T1538 <https://forge.softwareheritage.org/T1538>`_
+- lead: ardumont
+- effort: 1 PM - tooling & process
+
+Admin tooling for takedown notices (URLs)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: contract, compliance
+- task: `T3087 <https://forge.softwareheritage.org/T3087>`_
+- lead: anlambert
+- effort: 2 PM
+
+Includes work:
+
+- admin interface
+- journal of operations
+- web page with list of accepted TDN
+
+Preserve
+--------
+
+Complete and up-to-date archive copy on S3
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: stability
+- task: `T3085 <https://forge.softwareheritage.org/T3085>`_
+- lead: douardda
+- effort: 1 PM
+
+Includes work:
+
+- live update of the objects
+- regular dumps of the (anonymized) Merkle graph
+
+Scale-out graph storage in production
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: scalability
+- task: `T2214 <https://forge.softwareheritage.org/T2214>`_
+- lead: vlorentz
+- effort: 3 PM
+
+Includes work:
+
+
+- Cassandra: `T1892 <https://forge.softwareheritage.org/T1892>`_ (*maybe with external help*)
+
+Scale-out object storage prototype
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: stability, scalability, *externalized*
+- task: `T3054 <https://forge.softwareheritage.org/T3054>`_
+- lead: dachary
+- effort: 3 PM
+
+Cold storage archive in Vitam instance at CINES
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: contract
+- task: `T3113 <https://forge.softwareheritage.org/T3113>`_
+- lead: douardda
+- effort: 4 PM
+
+Mirrors
+^^^^^^^
+
+- tags: stability, scalability
+- depends: scale-out object storage
+- task: `T3116 <https://forge.softwareheritage.org/T3116>`_
+- lead: douardda
+- effort: 3 PM
+
+Includes work:
+
+- get up and running at least one mirror
+
+SWHID v2
+^^^^^^^^
+
+- tags: stability, evolution, datamodel
+- task: `T3134 <https://forge.softwareheritage.org/T3134>`_
+- lead: zack
+- effort: 6 PM
+
+ Includes work:
+
+- complete on paper spec
+- align with new git hashes
+- including migration plan from v1
+- understand impact on internal microservice architecture
+- keep correspondence with v1 (there may be multiple v2 for one v1!)
+- reviewed by crypto experts
+
+Integrity
+^^^^^^^^^
+
+- tags: stability, reliability
+- task: `T3135 <https://forge.softwareheritage.org/T3135>`_
+- lead: olasd
+- effort: 2 PM
+
+Includes work:
+
+- making sure objects aren’t corrupted before insertion `T399 <https://forge.softwareheritage.org/T399>`_
+- ... and that existing ones are not part of `T75 <https://forge.softwareheritage.org/T75>`_
+- make corruption check periodically
+
+
+Share
+-----
+
+swh-graph in production
+^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: scalability
+- task: `T2220 <https://forge.softwareheritage.org/T2220>`_
+- lead: zack
+- effort: 2 PM
+
+Efficient and reliable Vault download
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: stability
+- task: `T3096 <https://forge.softwareheritage.org/T3096>`_
+- lead: vlorentz
+- effort: 3 PM
+
+Includes work:
+
+- swh-graph may speed up a lot operations
+
+Web API 2.0
+^^^^^^^^^^^
+
+- tags: reliability, interoperability
+- task: `T2194 <https://forge.softwareheritage.org/T2194>`_
+- lead: anlambert
+- effort: 4 PM
+
+Includes work:
+
+- OpenAPI specification - implementation
+
+Expose metadata and make them searchable
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: openscience
+- task: `T3097 <https://forge.softwareheritage.org/T3097>`_
+- lead: vlorentz
+- effort: 3 PM
+
+Includes work:
+
+- index extrinsic metadata in swh-search/Elasticsearch from the journal `T2073 <https://forge.softwareheritage.org/T2073>`_
+- create API endpoint to access raw_extrinsic_metadata `T2938 <https://forge.softwareheritage.org/T2938>`_
+- show metadata in the web UI `T2088 <https://forge.softwareheritage.org/T2088>`_
+
+Full text search prototype
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: feature, wishlist
+- task: `T2204 <https://forge.softwareheritage.org/T2204>`_
+- lead: anlambert
+- effort: 3 PM
+
+Includes work:
+
+- requires integration with swh-graph and/or provenance index
+
+Organize
+--------
+
+Collect extrinsic metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: compliance
+- task: `T2202 <https://forge.softwareheritage.org/T2202>`_
+- lead: vlorentz
+- effort: 3 PM
+
+Includesd work:
+- working pipeline
+- at least 1 instance running ClearlyDefined
+- forge metadata (info on the main page, etc.)
+
+Provenance in production
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: contract, feature
+- task: `T3112 <https://forge.softwareheritage.org/T3112>`_
+- lead: zack
+- effort: 6 PM
+
+Prior art
+^^^^^^^^^
+
+- tags: compliance
+- depends: provenance \| swh-graph in production
+- task: `T3136 <https://forge.softwareheritage.org/T3136>`_
+- lead: zack
+- effort: 3 PM
+
+Includes work:
+
+- pinpoint origin of selected source code artifacts
+- possibly integrated with swh-scanner
+
+Measurement
+-----------
+
+Efficient archive counters (HyperLogLog)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: measure, comm
+- task: `T2912 <https://forge.softwareheritage.org/T2912>`_
+- lead: vsellier
+- effort: 1 PM
+
+Distribution of origins by forge
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: measure, comm
+- task: `T3127 <https://forge.softwareheritage.org/T3127>`_
+- lead: anlambert
+- effort: 1 PM
+
+Stats on regular crawling by forge
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: measure, comm
+- task: `T1363 <https://forge.softwareheritage.org/T1363>`_
+- lead: olasd
+- effort: 1 PM
+
+Includes work:
+
+- lag, periodicity, # of changes since last visit, etc.
+
+View deposits per user (admin and user)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: measure, support
+- task: `T3128 <https://forge.softwareheritage.org/T3128>`_
+- lead: ardumont
+- effort: 1 PM
+
+Reliable user-level monitoring of services
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: stability
+- task: `T3129 <https://forge.softwareheritage.org/T3129>`_
+- lead: vsellier
+- effort: 2 PM
+
+Includes work:
+
+- status.softwareheritage.org
+
+Documentation
+-------------
+
+Write use case-specific documentation
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: comm, web, doc
+- task: `T2234 <https://forge.softwareheritage.org/T2234>`_
+- lead: moranegg
+- effort: 2 PM
+
+Includes FAQ for: - users - ambassadors
+
+Improve quality of code documentation
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: doc, *externalized*
+- task: TODO
+- lead: TBD
+- effort: 2PM
+
+Includes work:
+
+- doc(string) audit - team training about doc writing
+
+Documentation strategy
+^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: doc
+- task: `T2624 <https://forge.softwareheritage.org/T2624>`_
+- lead: moranegg
+- effort: 1 PM
+
+Includes work:
+
+- respective role of docs.s.o, wiki, www.s.o, etc.
+
+Community
+---------
+
+Tooling for fundraising campaigns
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: web
+- task: `T3077 <https://forge.softwareheritage.org/T3077>`_
+- lead: anlambert
+- effort: 1 PM
+
+Dedicated page to list status of supported listers/loaders
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: web, doc
+- task: `T3117 <https://forge.softwareheritage.org/T3117>`_
+- lead: anlambert
+- effort: 1 PM
+
+Includes work:
+
+- `T1870 <https://forge.softwareheritage.org/T1870>`_
+- design web page
+- process to maintain up to date
+- make clearly visible and link to Sloan subgrants
+
+Tooling
+-------
+
+Migration to GitLab
+^^^^^^^^^^^^^^^^^^^
+
+- tags: forge, development
+- task: `T2225 <https://forge.softwareheritage.org/T2225>`_
+- lead: olasd
+- effort: 1PM
+

File Metadata

Mime Type
text/x-diff
Expires
Thu, Sep 18, 4:50 PM (1 d, 7 h)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3347978

Event Timeline