Page Menu
Home
Software Heritage
Search
Configure Global Search
Log In
Files
F11023593
No One
Temporary
Actions
View File
Edit File
Delete File
View Transforms
Subscribe
Mute Notifications
Award Token
Flag For Later
Size
15 KB
Subscribers
None
View Options
diff --git a/docs/index.rst b/docs/index.rst
index 9cd0c72..165ac49 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -1,211 +1,217 @@
.. _swh-docs:
Software Heritage - Development Documentation
=============================================
Getting started
---------------
* :ref:`getting-started` → deploy a local copy of the Software Heritage
software stack in less than 5 minutes, or
* :ref:`developer-setup` → get a working development setup that allows to hack
on the Software Heritage software stack
Contributing
------------
* :ref:`patch-submission` → learn how to submit your patches to the
Software Heritage codebase
* :ref:`code-review` → rules and guidelines to review code in
Software Heritage
* :ref:`python-style-guide` → how to format the Python code you write
Architecture
------------
* :ref:`architecture` → get a glimpse of the Software Heritage software
architecture
* :ref:`mirror` → learn what a Software Heritage mirror is and how to set up
one
* :ref:`keycloak <keycloak>` → learn what keycloak is and how to register and configure users
Data Model and Specifications
-----------------------------
* :ref:`persistent-identifiers` Specifications of the SoftWare Heritage persistent IDentifiers (SWHID).
* :ref:`data-model` Documentation of the main |swh| archive data model.
* :ref:`journal-specs` Documentation of the Kafka journal of the |swh| archive.
Tutorials
---------
* :ref:`testing-guide`
* :doc:`/tutorials/issue-debugging-monitoring`
* :ref:`Listing the content of your favorite forge <lister-tutorial>`
and :ref:`running a lister in Docker <run-lister-tutorial>`
+Roadmap
+-------
+
+* :ref:`roadmap-2021`
+
Components
----------
Here is brief overview of the most relevant software components in the Software
Heritage stack. Each component name is linked to the development documentation
of the corresponding Python module.
:ref:`swh.auth <swh-auth>`
low-level library used by modules needing keycloak authentication
:ref:`swh.core <swh-core>`
low-level utilities and helpers used by almost all other modules in the
stack
:ref:`swh.counters <swh-counters>`
service providing efficient estimates of the number of objects in the SWH archive,
using Redis's Hyperloglog
:ref:`swh.dataset <swh-dataset>`
public datasets and periodic data dumps of the archive released by Software
Heritage
:ref:`swh.deposit <swh-deposit>`
push-based deposit of software artifacts to the archive
swh.docs
developer documentation (used to generate this doc you are reading)
:ref:`swh.fuse <swh-fuse>`
Virtual file system to browse the Software Heritage archive, based on
`FUSE <https://github.com/libfuse/libfuse>`_
:ref:`swh.graph <swh-graph>`
Fast, compressed, in-memory representation of the archive, with tooling to
generate and query it.
:ref:`swh.indexer <swh-indexer>`
tools and workers used to crawl the content of the archive and extract
derived information from any artifact stored in it
:ref:`swh.journal <swh-journal>`
persistent logger of changes to the archive, with publish-subscribe support
:ref:`swh.lister <swh-lister>`
collection of listers for all sorts of source code hosting and distribution
places (forges, distributions, package managers, etc.)
:ref:`swh.loader-core <swh-loader-core>`
low-level loading utilities and helpers used by all other loaders
:ref:`swh.loader-git <swh-loader-git>`
loader for `Git <https://git-scm.com/>`_ repositories
:ref:`swh.loader-mercurial <swh-loader-mercurial>`
loader for `Mercurial <https://www.mercurial-scm.org/>`_ repositories
:ref:`swh.loader-svn <swh-loader-svn>`
loader for `Subversion <https://subversion.apache.org/>`_ repositories
:ref:`swh.model <swh-model>`
implementation of the :ref:`data-model` to archive source code artifacts
:ref:`swh.objstorage <swh-objstorage>`
content-addressable object storage
:ref:`swh.objstorage.replayer <swh-objstorage-replayer>`
Object storage replication tool
:ref:`swh.scanner <swh-scanner>`
source code scanner to analyze code bases and compare them with source code
artifacts archived by Software Heritage
:ref:`swh.scheduler <swh-scheduler>`
task manager for asynchronous/delayed tasks, used for recurrent (e.g.,
listing a forge, loading new stuff from a Git repository) and one-off
activities (e.g., loading a specific version of a source package)
:ref:`swh.search <swh-search>`
search engine for the archive
:ref:`swh.storage <swh-storage>`
abstraction layer over the archive, allowing to access all stored source
code artifacts as well as their metadata
:ref:`swh.vault <swh-vault>`
implementation of the vault service, allowing to retrieve parts of the
archive as self-contained bundles (e.g., individual releases, entire
repository snapshots, etc.)
:ref:`swh.web <swh-web>`
Web application(s) to browse the archive, for both interactive (HTML UI)
and mechanized (REST API) use
:ref:`swh.web.client <swh-web-client>`
Python client for :ref:`swh.web <swh-web>`
Dependencies
------------
The dependency relationships among the various modules are depicted below.
.. _py-deps-swh:
.. figure:: images/py-deps-swh.svg
:width: 1024px
:align: center
Dependencies among top-level Python modules (click to zoom).
Archive
-------
* :ref:`Archive ChangeLog <archive-changelog>`: notable changes to the archive
over time
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* `URLs index <http-routingtable.html>`_
* :ref:`search`
* :ref:`glossary`
.. ensure sphinx does not complain about index files not being included
.. toctree::
:maxdepth: 2
:caption: Contents:
:titlesonly:
:hidden:
getting-started-index
architecture/index
contributing/index
tutorials/index
API documentation <apidoc/modules>
+ roadmap/roadmap-2021.rst
swh.auth <swh-auth/index>
swh.core <swh-core/index>
swh.counters <swh-counters/index>
swh.dataset <swh-dataset/index>
swh.deposit <swh-deposit/index>
swh.fuse <swh-fuse/index>
swh.graph <swh-graph/index>
swh.indexer <swh-indexer/index>
swh.journal <swh-journal/index>
swh.lister <swh-lister/index>
swh.loader <swh-loader>
swh.model <swh-model/index>
swh.objstorage <swh-objstorage/index>
swh.objstorage.replayer <swh-objstorage-replayer/index>
swh.scanner <swh-scanner/index>
swh.scheduler <swh-scheduler/index>
swh.search <swh-search/index>
swh.storage <swh-storage/index>
swh.vault <swh-vault/index>
swh.web <swh-web/index>
swh.web.client <swh-web-client/index>
archive-changelog
journal
diff --git a/docs/roadmap/roadmap-2021.rst b/docs/roadmap/roadmap-2021.rst
new file mode 100644
index 0000000..d7c5a67
--- /dev/null
+++ b/docs/roadmap/roadmap-2021.rst
@@ -0,0 +1,384 @@
+.. _roadmap-2021:
+
+Roadmap 2021
+============
+
+(Version 1.0, last modified 5/5/2021)
+
+This document provides an overview of the technical roadmap of Software Heritage for
+2021.
+
+The `Kanban board <https://forge.softwareheritage.org/project/board/160/query/all/>`_
+is seen through our forge.
+
+
+.. contents::
+ :depth: 3
+..
+
+
+Collect
+-------
+
+Faster and more reliable save code now
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: openscience
+- task: `T3082 <https://forge.softwareheritage.org/T3082>`_
+- lead: ardumont
+- effort: 1 PM
+
+Includes work:
+
+- set up dedicated fast track pipeline for save code now
+- improve save code now monitoring (user and admin)
+
+Improve deposit integration, management and display
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: openscience
+- task: `T3128 <https://forge.softwareheritage.org/T3128>`_
+- lead: moranegg
+- effort: 3 PM
+
+Includes work:
+
+- full invenioRDM integration `T2344 <https://forge.softwareheritage.org/T2344>`_
+- metadata only deposit `T2540 <https://forge.softwareheritage.org/T2540>`_
+
+Save forge now
+^^^^^^^^^^^^^^
+
+- tags: expand
+- task: `T1538 <https://forge.softwareheritage.org/T1538>`_
+- lead: ardumont
+- effort: 1 PM - tooling & process
+
+Admin tooling for takedown notices (URLs)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: contract, compliance
+- task: `T3087 <https://forge.softwareheritage.org/T3087>`_
+- lead: anlambert
+- effort: 2 PM
+
+Includes work:
+
+- admin interface
+- journal of operations
+- web page with list of accepted TDN
+
+Preserve
+--------
+
+Complete and up-to-date archive copy on S3
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: stability
+- task: `T3085 <https://forge.softwareheritage.org/T3085>`_
+- lead: douardda
+- effort: 1 PM
+
+Includes work:
+
+- live update of the objects
+- regular dumps of the (anonymized) Merkle graph
+
+Scale-out graph storage in production
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: scalability
+- task: `T2214 <https://forge.softwareheritage.org/T2214>`_
+- lead: vlorentz
+- effort: 3 PM
+
+Includes work:
+
+
+- Cassandra: `T1892 <https://forge.softwareheritage.org/T1892>`_ (*maybe with external help*)
+
+Scale-out object storage prototype
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: stability, scalability, *externalized*
+- task: `T3054 <https://forge.softwareheritage.org/T3054>`_
+- lead: dachary
+- effort: 3 PM
+
+Cold storage archive in Vitam instance at CINES
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: contract
+- task: `T3113 <https://forge.softwareheritage.org/T3113>`_
+- lead: douardda
+- effort: 4 PM
+
+Mirrors
+^^^^^^^
+
+- tags: stability, scalability
+- depends: scale-out object storage
+- task: `T3116 <https://forge.softwareheritage.org/T3116>`_
+- lead: douardda
+- effort: 3 PM
+
+Includes work:
+
+- get up and running at least one mirror
+
+SWHID v2
+^^^^^^^^
+
+- tags: stability, evolution, datamodel
+- task: `T3134 <https://forge.softwareheritage.org/T3134>`_
+- lead: zack
+- effort: 6 PM
+
+ Includes work:
+
+- complete on paper spec
+- align with new git hashes
+- including migration plan from v1
+- understand impact on internal microservice architecture
+- keep correspondence with v1 (there may be multiple v2 for one v1!)
+- reviewed by crypto experts
+
+Integrity
+^^^^^^^^^
+
+- tags: stability, reliability
+- task: `T3135 <https://forge.softwareheritage.org/T3135>`_
+- lead: olasd
+- effort: 2 PM
+
+Includes work:
+
+- making sure objects aren’t corrupted before insertion `T399 <https://forge.softwareheritage.org/T399>`_
+- ... and that existing ones are not part of `T75 <https://forge.softwareheritage.org/T75>`_
+- make corruption check periodically
+
+
+Share
+-----
+
+swh-graph in production
+^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: scalability
+- task: `T2220 <https://forge.softwareheritage.org/T2220>`_
+- lead: zack
+- effort: 2 PM
+
+Efficient and reliable Vault download
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: stability
+- task: `T3096 <https://forge.softwareheritage.org/T3096>`_
+- lead: vlorentz
+- effort: 3 PM
+
+Includes work:
+
+- swh-graph may speed up a lot operations
+
+Web API 2.0
+^^^^^^^^^^^
+
+- tags: reliability, interoperability
+- task: `T2194 <https://forge.softwareheritage.org/T2194>`_
+- lead: anlambert
+- effort: 4 PM
+
+Includes work:
+
+- OpenAPI specification - implementation
+
+Expose metadata and make them searchable
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: openscience
+- task: `T3097 <https://forge.softwareheritage.org/T3097>`_
+- lead: vlorentz
+- effort: 3 PM
+
+Includes work:
+
+- index extrinsic metadata in swh-search/Elasticsearch from the journal `T2073 <https://forge.softwareheritage.org/T2073>`_
+- create API endpoint to access raw_extrinsic_metadata `T2938 <https://forge.softwareheritage.org/T2938>`_
+- show metadata in the web UI `T2088 <https://forge.softwareheritage.org/T2088>`_
+
+Full text search prototype
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: feature, wishlist
+- task: `T2204 <https://forge.softwareheritage.org/T2204>`_
+- lead: anlambert
+- effort: 3 PM
+
+Includes work:
+
+- requires integration with swh-graph and/or provenance index
+
+Organize
+--------
+
+Collect extrinsic metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: compliance
+- task: `T2202 <https://forge.softwareheritage.org/T2202>`_
+- lead: vlorentz
+- effort: 3 PM
+
+Includesd work:
+- working pipeline
+- at least 1 instance running ClearlyDefined
+- forge metadata (info on the main page, etc.)
+
+Provenance in production
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: contract, feature
+- task: `T3112 <https://forge.softwareheritage.org/T3112>`_
+- lead: zack
+- effort: 6 PM
+
+Prior art
+^^^^^^^^^
+
+- tags: compliance
+- depends: provenance \| swh-graph in production
+- task: `T3136 <https://forge.softwareheritage.org/T3136>`_
+- lead: zack
+- effort: 3 PM
+
+Includes work:
+
+- pinpoint origin of selected source code artifacts
+- possibly integrated with swh-scanner
+
+Measurement
+-----------
+
+Efficient archive counters (HyperLogLog)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: measure, comm
+- task: `T2912 <https://forge.softwareheritage.org/T2912>`_
+- lead: vsellier
+- effort: 1 PM
+
+Distribution of origins by forge
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: measure, comm
+- task: `T3127 <https://forge.softwareheritage.org/T3127>`_
+- lead: anlambert
+- effort: 1 PM
+
+Stats on regular crawling by forge
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: measure, comm
+- task: `T1363 <https://forge.softwareheritage.org/T1363>`_
+- lead: olasd
+- effort: 1 PM
+
+Includes work:
+
+- lag, periodicity, # of changes since last visit, etc.
+
+View deposits per user (admin and user)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: measure, support
+- task: `T3128 <https://forge.softwareheritage.org/T3128>`_
+- lead: ardumont
+- effort: 1 PM
+
+Reliable user-level monitoring of services
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: stability
+- task: `T3129 <https://forge.softwareheritage.org/T3129>`_
+- lead: vsellier
+- effort: 2 PM
+
+Includes work:
+
+- status.softwareheritage.org
+
+Documentation
+-------------
+
+Write use case-specific documentation
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: comm, web, doc
+- task: `T2234 <https://forge.softwareheritage.org/T2234>`_
+- lead: moranegg
+- effort: 2 PM
+
+Includes FAQ for: - users - ambassadors
+
+Improve quality of code documentation
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: doc, *externalized*
+- task: TODO
+- lead: TBD
+- effort: 2PM
+
+Includes work:
+
+- doc(string) audit - team training about doc writing
+
+Documentation strategy
+^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: doc
+- task: `T2624 <https://forge.softwareheritage.org/T2624>`_
+- lead: moranegg
+- effort: 1 PM
+
+Includes work:
+
+- respective role of docs.s.o, wiki, www.s.o, etc.
+
+Community
+---------
+
+Tooling for fundraising campaigns
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: web
+- task: `T3077 <https://forge.softwareheritage.org/T3077>`_
+- lead: anlambert
+- effort: 1 PM
+
+Dedicated page to list status of supported listers/loaders
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- tags: web, doc
+- task: `T3117 <https://forge.softwareheritage.org/T3117>`_
+- lead: anlambert
+- effort: 1 PM
+
+Includes work:
+
+- `T1870 <https://forge.softwareheritage.org/T1870>`_
+- design web page
+- process to maintain up to date
+- make clearly visible and link to Sloan subgrants
+
+Tooling
+-------
+
+Migration to GitLab
+^^^^^^^^^^^^^^^^^^^
+
+- tags: forge, development
+- task: `T2225 <https://forge.softwareheritage.org/T2225>`_
+- lead: olasd
+- effort: 1PM
+
File Metadata
Details
Attached
Mime Type
text/x-diff
Expires
Thu, Sep 18, 4:50 PM (1 d, 7 h)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3347978
Attached To
rDDOC Development documentation
Event Timeline
Log In to Comment