diff --git a/docs/index.rst b/docs/index.rst index 9cd0c72..165ac49 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -1,211 +1,217 @@ .. _swh-docs: Software Heritage - Development Documentation ============================================= Getting started --------------- * :ref:`getting-started` → deploy a local copy of the Software Heritage software stack in less than 5 minutes, or * :ref:`developer-setup` → get a working development setup that allows to hack on the Software Heritage software stack Contributing ------------ * :ref:`patch-submission` → learn how to submit your patches to the Software Heritage codebase * :ref:`code-review` → rules and guidelines to review code in Software Heritage * :ref:`python-style-guide` → how to format the Python code you write Architecture ------------ * :ref:`architecture` → get a glimpse of the Software Heritage software architecture * :ref:`mirror` → learn what a Software Heritage mirror is and how to set up one * :ref:`keycloak ` → learn what keycloak is and how to register and configure users Data Model and Specifications ----------------------------- * :ref:`persistent-identifiers` Specifications of the SoftWare Heritage persistent IDentifiers (SWHID). * :ref:`data-model` Documentation of the main |swh| archive data model. * :ref:`journal-specs` Documentation of the Kafka journal of the |swh| archive. Tutorials --------- * :ref:`testing-guide` * :doc:`/tutorials/issue-debugging-monitoring` * :ref:`Listing the content of your favorite forge ` and :ref:`running a lister in Docker ` +Roadmap +------- + +* :ref:`roadmap-2021` + Components ---------- Here is brief overview of the most relevant software components in the Software Heritage stack. Each component name is linked to the development documentation of the corresponding Python module. :ref:`swh.auth ` low-level library used by modules needing keycloak authentication :ref:`swh.core ` low-level utilities and helpers used by almost all other modules in the stack :ref:`swh.counters ` service providing efficient estimates of the number of objects in the SWH archive, using Redis's Hyperloglog :ref:`swh.dataset ` public datasets and periodic data dumps of the archive released by Software Heritage :ref:`swh.deposit ` push-based deposit of software artifacts to the archive swh.docs developer documentation (used to generate this doc you are reading) :ref:`swh.fuse ` Virtual file system to browse the Software Heritage archive, based on `FUSE `_ :ref:`swh.graph ` Fast, compressed, in-memory representation of the archive, with tooling to generate and query it. :ref:`swh.indexer ` tools and workers used to crawl the content of the archive and extract derived information from any artifact stored in it :ref:`swh.journal ` persistent logger of changes to the archive, with publish-subscribe support :ref:`swh.lister ` collection of listers for all sorts of source code hosting and distribution places (forges, distributions, package managers, etc.) :ref:`swh.loader-core ` low-level loading utilities and helpers used by all other loaders :ref:`swh.loader-git ` loader for `Git `_ repositories :ref:`swh.loader-mercurial ` loader for `Mercurial `_ repositories :ref:`swh.loader-svn ` loader for `Subversion `_ repositories :ref:`swh.model ` implementation of the :ref:`data-model` to archive source code artifacts :ref:`swh.objstorage ` content-addressable object storage :ref:`swh.objstorage.replayer ` Object storage replication tool :ref:`swh.scanner ` source code scanner to analyze code bases and compare them with source code artifacts archived by Software Heritage :ref:`swh.scheduler ` task manager for asynchronous/delayed tasks, used for recurrent (e.g., listing a forge, loading new stuff from a Git repository) and one-off activities (e.g., loading a specific version of a source package) :ref:`swh.search ` search engine for the archive :ref:`swh.storage ` abstraction layer over the archive, allowing to access all stored source code artifacts as well as their metadata :ref:`swh.vault ` implementation of the vault service, allowing to retrieve parts of the archive as self-contained bundles (e.g., individual releases, entire repository snapshots, etc.) :ref:`swh.web ` Web application(s) to browse the archive, for both interactive (HTML UI) and mechanized (REST API) use :ref:`swh.web.client ` Python client for :ref:`swh.web ` Dependencies ------------ The dependency relationships among the various modules are depicted below. .. _py-deps-swh: .. figure:: images/py-deps-swh.svg :width: 1024px :align: center Dependencies among top-level Python modules (click to zoom). Archive ------- * :ref:`Archive ChangeLog `: notable changes to the archive over time Indices and tables ================== * :ref:`genindex` * :ref:`modindex` * `URLs index `_ * :ref:`search` * :ref:`glossary` .. ensure sphinx does not complain about index files not being included .. toctree:: :maxdepth: 2 :caption: Contents: :titlesonly: :hidden: getting-started-index architecture/index contributing/index tutorials/index API documentation + roadmap/roadmap-2021.rst swh.auth swh.core swh.counters swh.dataset swh.deposit swh.fuse swh.graph swh.indexer swh.journal swh.lister swh.loader swh.model swh.objstorage swh.objstorage.replayer swh.scanner swh.scheduler swh.search swh.storage swh.vault swh.web swh.web.client archive-changelog journal diff --git a/docs/roadmap/roadmap-2021.rst b/docs/roadmap/roadmap-2021.rst new file mode 100644 index 0000000..d7c5a67 --- /dev/null +++ b/docs/roadmap/roadmap-2021.rst @@ -0,0 +1,384 @@ +.. _roadmap-2021: + +Roadmap 2021 +============ + +(Version 1.0, last modified 5/5/2021) + +This document provides an overview of the technical roadmap of Software Heritage for +2021. + +The `Kanban board `_ +is seen through our forge. + + +.. contents:: + :depth: 3 +.. + + +Collect +------- + +Faster and more reliable save code now +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +- tags: openscience +- task: `T3082 `_ +- lead: ardumont +- effort: 1 PM + +Includes work: + +- set up dedicated fast track pipeline for save code now +- improve save code now monitoring (user and admin) + +Improve deposit integration, management and display +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +- tags: openscience +- task: `T3128 `_ +- lead: moranegg +- effort: 3 PM + +Includes work: + +- full invenioRDM integration `T2344 `_ +- metadata only deposit `T2540 `_ + +Save forge now +^^^^^^^^^^^^^^ + +- tags: expand +- task: `T1538 `_ +- lead: ardumont +- effort: 1 PM - tooling & process + +Admin tooling for takedown notices (URLs) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +- tags: contract, compliance +- task: `T3087 `_ +- lead: anlambert +- effort: 2 PM + +Includes work: + +- admin interface +- journal of operations +- web page with list of accepted TDN + +Preserve +-------- + +Complete and up-to-date archive copy on S3 +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +- tags: stability +- task: `T3085 `_ +- lead: douardda +- effort: 1 PM + +Includes work: + +- live update of the objects +- regular dumps of the (anonymized) Merkle graph + +Scale-out graph storage in production +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +- tags: scalability +- task: `T2214 `_ +- lead: vlorentz +- effort: 3 PM + +Includes work: + + +- Cassandra: `T1892 `_ (*maybe with external help*) + +Scale-out object storage prototype +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +- tags: stability, scalability, *externalized* +- task: `T3054 `_ +- lead: dachary +- effort: 3 PM + +Cold storage archive in Vitam instance at CINES +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +- tags: contract +- task: `T3113 `_ +- lead: douardda +- effort: 4 PM + +Mirrors +^^^^^^^ + +- tags: stability, scalability +- depends: scale-out object storage +- task: `T3116 `_ +- lead: douardda +- effort: 3 PM + +Includes work: + +- get up and running at least one mirror + +SWHID v2 +^^^^^^^^ + +- tags: stability, evolution, datamodel +- task: `T3134 `_ +- lead: zack +- effort: 6 PM + + Includes work: + +- complete on paper spec +- align with new git hashes +- including migration plan from v1 +- understand impact on internal microservice architecture +- keep correspondence with v1 (there may be multiple v2 for one v1!) +- reviewed by crypto experts + +Integrity +^^^^^^^^^ + +- tags: stability, reliability +- task: `T3135 `_ +- lead: olasd +- effort: 2 PM + +Includes work: + +- making sure objects aren’t corrupted before insertion `T399 `_ +- ... and that existing ones are not part of `T75 `_ +- make corruption check periodically + + +Share +----- + +swh-graph in production +^^^^^^^^^^^^^^^^^^^^^^^ + +- tags: scalability +- task: `T2220 `_ +- lead: zack +- effort: 2 PM + +Efficient and reliable Vault download +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +- tags: stability +- task: `T3096 `_ +- lead: vlorentz +- effort: 3 PM + +Includes work: + +- swh-graph may speed up a lot operations + +Web API 2.0 +^^^^^^^^^^^ + +- tags: reliability, interoperability +- task: `T2194 `_ +- lead: anlambert +- effort: 4 PM + +Includes work: + +- OpenAPI specification - implementation + +Expose metadata and make them searchable +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +- tags: openscience +- task: `T3097 `_ +- lead: vlorentz +- effort: 3 PM + +Includes work: + +- index extrinsic metadata in swh-search/Elasticsearch from the journal `T2073 `_ +- create API endpoint to access raw_extrinsic_metadata `T2938 `_ +- show metadata in the web UI `T2088 `_ + +Full text search prototype +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +- tags: feature, wishlist +- task: `T2204 `_ +- lead: anlambert +- effort: 3 PM + +Includes work: + +- requires integration with swh-graph and/or provenance index + +Organize +-------- + +Collect extrinsic metadata +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +- tags: compliance +- task: `T2202 `_ +- lead: vlorentz +- effort: 3 PM + +Includesd work: +- working pipeline +- at least 1 instance running ClearlyDefined +- forge metadata (info on the main page, etc.) + +Provenance in production +^^^^^^^^^^^^^^^^^^^^^^^^ + +- tags: contract, feature +- task: `T3112 `_ +- lead: zack +- effort: 6 PM + +Prior art +^^^^^^^^^ + +- tags: compliance +- depends: provenance \| swh-graph in production +- task: `T3136 `_ +- lead: zack +- effort: 3 PM + +Includes work: + +- pinpoint origin of selected source code artifacts +- possibly integrated with swh-scanner + +Measurement +----------- + +Efficient archive counters (HyperLogLog) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +- tags: measure, comm +- task: `T2912 `_ +- lead: vsellier +- effort: 1 PM + +Distribution of origins by forge +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +- tags: measure, comm +- task: `T3127 `_ +- lead: anlambert +- effort: 1 PM + +Stats on regular crawling by forge +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +- tags: measure, comm +- task: `T1363 `_ +- lead: olasd +- effort: 1 PM + +Includes work: + +- lag, periodicity, # of changes since last visit, etc. + +View deposits per user (admin and user) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +- tags: measure, support +- task: `T3128 `_ +- lead: ardumont +- effort: 1 PM + +Reliable user-level monitoring of services +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +- tags: stability +- task: `T3129 `_ +- lead: vsellier +- effort: 2 PM + +Includes work: + +- status.softwareheritage.org + +Documentation +------------- + +Write use case-specific documentation +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +- tags: comm, web, doc +- task: `T2234 `_ +- lead: moranegg +- effort: 2 PM + +Includes FAQ for: - users - ambassadors + +Improve quality of code documentation +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +- tags: doc, *externalized* +- task: TODO +- lead: TBD +- effort: 2PM + +Includes work: + +- doc(string) audit - team training about doc writing + +Documentation strategy +^^^^^^^^^^^^^^^^^^^^^^ + +- tags: doc +- task: `T2624 `_ +- lead: moranegg +- effort: 1 PM + +Includes work: + +- respective role of docs.s.o, wiki, www.s.o, etc. + +Community +--------- + +Tooling for fundraising campaigns +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +- tags: web +- task: `T3077 `_ +- lead: anlambert +- effort: 1 PM + +Dedicated page to list status of supported listers/loaders +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +- tags: web, doc +- task: `T3117 `_ +- lead: anlambert +- effort: 1 PM + +Includes work: + +- `T1870 `_ +- design web page +- process to maintain up to date +- make clearly visible and link to Sloan subgrants + +Tooling +------- + +Migration to GitLab +^^^^^^^^^^^^^^^^^^^ + +- tags: forge, development +- task: `T2225 `_ +- lead: olasd +- effort: 1PM +