diff --git a/docs/api-reference.rst b/docs/api-reference.rst index c6b9695..3bbe87a 100644 --- a/docs/api-reference.rst +++ b/docs/api-reference.rst @@ -1,29 +1,30 @@ .. _api-reference: API reference ============= .. toctree:: :maxdepth: 2 swh.auth swh.core swh.counters swh.dataset swh.deposit swh.fuse swh.graph swh.indexer swh.journal swh.lister swh.loader swh.model swh.objstorage swh.objstorage.replayer + swh.perfecthash swh.scanner swh.scheduler swh.search swh.storage swh.vault swh.web swh.web.client diff --git a/docs/index.rst b/docs/index.rst index 7866e57..c76b9d5 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -1,214 +1,218 @@ .. _swh-docs: Software Heritage - Development Documentation ============================================= Getting started --------------- * :ref:`getting-started` → deploy a local copy of the Software Heritage software stack in less than 5 minutes, or * :ref:`developer-setup` → get a working development setup that allows to hack on the Software Heritage software stack * :ref:`faq` Contributing ------------ * :ref:`patch-submission` → learn how to submit your patches to the Software Heritage codebase * :ref:`code-review` → rules and guidelines to review code in Software Heritage * :ref:`python-style-guide` → how to format the Python code you write Architecture ------------ * :ref:`architecture-overview` → get a glimpse of the Software Heritage software architecture * :ref:`mirror` → learn what a Software Heritage mirror is and how to set up one * :ref:`Metadata workflow ` → learn how Software Heritage stores and handles metadata * :ref:`Keycloak ` → learn how to use Keycloak, the authentication system used by |swh|'s web interface and public APIs Data Model and Specifications ----------------------------- * :ref:`persistent-identifiers` Specifications of the SoftWare Heritage persistent IDentifiers (SWHID). * :ref:`data-model` Documentation of the main |swh| archive data model. * :ref:`journal-specs` Documentation of the Kafka journal of the |swh| archive. Tutorials --------- * :ref:`testing-guide` * :doc:`/tutorials/issue-debugging-monitoring` * :ref:`Listing the content of your favorite forge ` and :ref:`running a lister in Docker ` * :ref:`Add a new swh package ` * :ref:`doc-contribution` Roadmap ------- * :ref:`roadmap-2021` Engineering ----------- * :ref:`Infrastructure ` Components ---------- Here is brief overview of the most relevant software components in the Software Heritage stack, in alphabetical order. For a better introduction to the architecture, see the :ref:`architecture-overview`, which presents each of them in a didactical order. Each component name is linked to the development documentation of the corresponding Python module. :ref:`swh.auth ` low-level library used by modules needing keycloak authentication :ref:`swh.core ` low-level utilities and helpers used by almost all other modules in the stack :ref:`swh.counters ` service providing efficient estimates of the number of objects in the SWH archive, using Redis's Hyperloglog :ref:`swh.dataset ` public datasets and periodic data dumps of the archive released by Software Heritage :ref:`swh.deposit ` push-based deposit of software artifacts to the archive swh.docs developer documentation (used to generate this doc you are reading) :ref:`swh.fuse ` Virtual file system to browse the Software Heritage archive, based on `FUSE `_ :ref:`swh.graph ` Fast, compressed, in-memory representation of the archive, with tooling to generate and query it. :ref:`swh.indexer ` tools and workers used to crawl the content of the archive and extract derived information from any artifact stored in it :ref:`swh.journal ` persistent logger of changes to the archive, with publish-subscribe support :ref:`swh.lister ` collection of listers for all sorts of source code hosting and distribution places (forges, distributions, package managers, etc.) :ref:`swh.loader-core ` low-level loading utilities and helpers used by all other loaders :ref:`swh.loader-git ` loader for `Git `_ repositories :ref:`swh.loader-mercurial ` loader for `Mercurial `_ repositories :ref:`swh.loader-svn ` loader for `Subversion `_ repositories :ref:`swh.loader-cvs ` loader for `CVS `_ repositories :ref:`swh.model ` implementation of the :ref:`data-model` to archive source code artifacts :ref:`swh.objstorage ` content-addressable object storage :ref:`swh.objstorage.replayer ` Object storage replication tool +:ref:`swh.perfecthash ` + Low level management for read-only content-addressable object storage + indexed with a perfect hash table + :ref:`swh.scanner ` source code scanner to analyze code bases and compare them with source code artifacts archived by Software Heritage :ref:`swh.scheduler ` task manager for asynchronous/delayed tasks, used for recurrent (e.g., listing a forge, loading new stuff from a Git repository) and one-off activities (e.g., loading a specific version of a source package) :ref:`swh.search ` search engine for the archive :ref:`swh.storage ` abstraction layer over the archive, allowing to access all stored source code artifacts as well as their metadata :ref:`swh.vault ` implementation of the vault service, allowing to retrieve parts of the archive as self-contained bundles (e.g., individual releases, entire repository snapshots, etc.) :ref:`swh.web ` Web application(s) to browse the archive, for both interactive (HTML UI) and mechanized (REST API) use :ref:`swh.web.client ` Python client for :ref:`swh.web ` Dependencies ------------ The dependency relationships among the various modules are depicted below. .. _py-deps-swh: .. figure:: images/py-deps-swh.svg :width: 1024px :align: center Dependencies among top-level Python modules (click to zoom). Archive ------- * :ref:`Archive ChangeLog `: notable changes to the archive over time Indices and tables ================== * :ref:`genindex` * :ref:`modindex` * `URLs index `_ * :ref:`search` * :ref:`glossary` .. ensure sphinx does not complain about index files not being included .. toctree:: :maxdepth: 2 :caption: Contents: :titlesonly: :hidden: getting-started/index architecture/index contributing/index tutorials/index faq/index roadmap/roadmap-2021 api-reference archive-changelog journal Python modules autodocumentation diff --git a/requirements-swh-dev.txt b/requirements-swh-dev.txt index 6a70cf8..6a8a873 100644 --- a/requirements-swh-dev.txt +++ b/requirements-swh-dev.txt @@ -1,31 +1,32 @@ # Add here internal Software Heritage dependencies, one per line. # Dependencies need to be ordered in a way that ensure only # development versions will be used (not the release ones hosted on PyPI). # # This is NOT in alphabetical order ../swh-core[http,db,logging] ../swh-auth[django] ../swh-model ../swh-journal ../swh-counters ../swh-objstorage[testing] ../swh-storage ../swh-objstorage-replayer ../swh-scheduler[simulator] ../swh-deposit ../swh-graph ../swh-icinga-plugins ../swh-indexer ../swh-lister ../swh-loader-core ../swh-loader-git ../swh-loader-mercurial ../swh-loader-svn ../swh-loader-cvs +../swh-perfecthash ../swh-search ../swh-vault ../swh-web ../swh-web-client ../swh-scanner ../swh-fuse diff --git a/requirements-swh.txt b/requirements-swh.txt index 29c8727..c46767d 100644 --- a/requirements-swh.txt +++ b/requirements-swh.txt @@ -1,25 +1,26 @@ # Add here internal Software Heritage dependencies, one per line. swh.auth[django] swh.core[db,http,logging] swh.counters swh.deposit[server] swh.fuse swh.graph swh.indexer swh.journal swh.lister swh.loader.core swh.loader.git swh.loader.mercurial swh.loader.svn swh.loader.cvs swh.model swh.objstorage[testing] swh.objstorage.replayer +swh.perfecthash swh.scanner swh.scheduler[simulator] swh.search swh.storage swh.vault swh.web swh.web.client diff --git a/swh/docs/sphinx/conf.py b/swh/docs/sphinx/conf.py index add7d84..1c44f88 100755 --- a/swh/docs/sphinx/conf.py +++ b/swh/docs/sphinx/conf.py @@ -1,287 +1,285 @@ #!/usr/bin/env python3 # -*- coding: utf-8 -*- # import logging import os from typing import Dict from sphinx.ext import autodoc from swh.docs.django_settings import force_django_settings # General information about the project. project = "Software Heritage - Development Documentation" copyright = "2015-2021 The Software Heritage developers" author = "The Software Heritage developers" # -- General configuration ------------------------------------------------ # Add any Sphinx extension module names here, as strings. They can be # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom # ones. extensions = [ "sphinx.ext.autodoc", "sphinx.ext.napoleon", "sphinx.ext.intersphinx", "sphinxcontrib.httpdomain", "sphinx.ext.extlinks", "sphinxcontrib.images", "sphinxcontrib.programoutput", "sphinx.ext.viewcode", "sphinx_tabs.tabs", "sphinx_rtd_theme", "sphinx.ext.graphviz", "sphinx_click.ext", "myst_parser", "sphinx.ext.todo", "sphinx_reredirects", "swh.docs.sphinx.view_in_phabricator", # swh.scheduler inherits some attribute descriptions from celery that use # custom crossrefs (eg. :setting:`task_ignore_result`) "sphinx_celery.setting_crossref", ] # Add any paths that contain templates here, relative to this directory. templates_path = ["_templates"] # The suffix(es) of source filenames. # You can specify multiple suffix as a list of string: # source_suffix = ".rst" # The master toctree document. master_doc = "index" # A string of reStructuredText that will be included at the beginning of every # source file that is read. # A bit hackish but should work both for each swh package and the whole swh-doc rst_prolog = """ .. include:: /../../swh-docs/docs/swh_substitutions """ # The version info for the project you're documenting, acts as replacement for # |version| and |release|, also used in various other places throughout the # built documents. # # The short X.Y version. version = "" # The full version, including alpha/beta/rc tags. release = "" # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. # # This is also used if you do content translation via gettext catalogs. # Usually you set "language" from the command line for these cases. language = "en" # List of patterns, relative to source directory, that match files and # directories to ignore when looking for source files. # This patterns also effect to html_static_path and html_extra_path exclude_patterns = [ "_build", "swh-icinga-plugins/index.rst", - "swh-perfecthash/index.rst", - "swh-perfecthash/README.rst", "swh.loader.cvs.rcsparse.setup.rst", "apidoc/swh.loader.cvs.rcsparse.setup.rst", ] # The name of the Pygments (syntax highlighting) style to use. pygments_style = "sphinx" # If true, `todo` and `todoList` produce output, else they produce nothing. todo_include_todos = True # -- Options for HTML output ---------------------------------------------- # The theme to use for HTML and HTML Help pages. See the documentation for # a list of builtin themes. # html_theme = "sphinx_rtd_theme" html_favicon = "_static/favicon.ico" # Theme options are theme-specific and customize the look and feel of a theme # further. For a list of options available for each theme, see the # documentation. # html_theme_options = { "collapse_navigation": True, "sticky_navigation": True, } html_logo = "_static/software-heritage-logo-title-motto-vertical-white.png" # Add any paths that contain custom static files (such as style sheets) here, # relative to this directory. They are copied after the builtin static files, # so a file named "default.css" will overwrite the builtin "default.css". html_static_path = ["_static"] # make logo actually appear, avoiding gotcha due to alabaster default conf. # https://github.com/bitprophet/alabaster/issues/97#issuecomment-303722935 html_sidebars = { "**": [ "about.html", "globaltoc.html", "relations.html", "sourcelink.html", "searchbox.html", ] } # If not None, a 'Last updated on:' timestamp is inserted at every page # bottom, using the given strftime format. # The empty string is equivalent to '%b %d, %Y'. html_last_updated_fmt = "%Y-%m-%d %H:%M:%S %Z" # refer to the Python standard library. intersphinx_mapping = { "python": ("https://docs.python.org/3", None), "swh-devel": ("https://docs.softwareheritage.org/devel", None), "swh-sysadm": ("https://docs.softwareheritage.org/sysadm", None), } # Redirects for pages that were moved, so we don't break external links. # Uses sphinx-reredirects redirects = { "swh-deposit/spec-api": "api/api-documentation.html", "swh-deposit/metadata": "api/metadata.html", "swh-deposit/specs/blueprint": "../api/use-cases.html", "swh-deposit/user-manual": "api/user-manual.html", "infrastructure/index.html": "../../sysadm/network-architecture/index.html", "infrastructure/network.html": "../../sysadm/network-architecture/index.html", "infrastructure/service-urls.html": "../../sysadm/network-architecture/service-urls.html", # noqa "architecture": "architecture/overview.html", "keycloak": "../../sysadm/user-management/keycloak/index.html", "mirror": "architecture/mirror.html", "users": "user", } # -- autodoc configuration ---------------------------------------------- autodoc_default_flags = [ "members", "undoc-members", "private-members", "special-members", ] autodoc_member_order = "bysource" autodoc_mock_imports = [ "rados", ] autoclass_content = "both" modindex_common_prefix = ["swh."] # For the todo extension. Todo and todolist produce output only if this is True todo_include_todos = True _swh_web_base_url = "https://archive.softwareheritage.org" # for the extlinks extension, sub-projects should fill that dict extlinks: Dict = { "swh_web": (f"{_swh_web_base_url}/%s", None), "swh_web_api": (f"{_swh_web_base_url}/api/1/%s", None), "swh_web_browse": (f"{_swh_web_base_url}/browse/%s", None), } # SWH_PACKAGE_DOC_TOX_BUILD environment variable is set in a tox environment # named sphinx for each swh package (except the swh-docs package itself). swh_package_doc_tox_build = os.environ.get("SWH_PACKAGE_DOC_TOX_BUILD", False) # override some configuration when building a swh package # documentation with tox to remove warnings and suppress # those related to unresolved references if swh_package_doc_tox_build: swh_substitutions = os.path.join( os.path.dirname(__file__), "../../../docs/swh_substitutions" ) rst_prolog = f".. include:: /{swh_substitutions}" suppress_warnings = ["ref.ref"] html_favicon = "" html_logo = "" class SimpleDocumenter(autodoc.FunctionDocumenter): """ Custom autodoc directive to inline the docstring of a function in a document without the signature header and with no indentation. Example of use:: .. autosimple:: swh.web.api.views.directory.api_directory """ objtype = "simple" # ensure the priority is lesser than the base FunctionDocumenter # to avoid side effects with autodoc processing priority = -1 # do not indent the content content_indent = "" # do not add a header to the docstring def add_directive_header(self, sig): pass # sphinx event handler to set adequate django settings prior reading # apidoc generated rst files when building doc to avoid autodoc errors def set_django_settings(app, env, docname): if any([pattern in app.srcdir for pattern in ("swh-web-client", "DWCLI")]): # swh-web-client is detected as swh-web by the code below but # django is not installed when building standalone swh-web-client doc return package_settings = { "auth": "swh.auth.tests.django.app.apptest.settings", "deposit": "swh.deposit.settings.development", "web": "swh.web.settings.development", } for package, settings in package_settings.items(): if any( [pattern in docname for pattern in (f"swh.{package}", f"swh-{package}")] ): force_django_settings(settings) # when building local package documentation with tox, insert glossary # content at the end of the index file in order to resolve references # to the terms it contains def add_glossary_to_index(app, docname, source): if docname == "index": glossary_path = os.path.join( os.path.dirname(__file__), "../../../docs/glossary.rst" ) with open(glossary_path, "r") as glossary: source[0] += "\n" + glossary.read() def setup(app): # env-purge-doc event is fired before source-read app.connect("env-purge-doc", set_django_settings) # add autosimple directive (used in swh-web) app.add_autodocumenter(SimpleDocumenter) # set an environment variable indicating we are currently building # the documentation os.environ["SWH_DOC_BUILD"] = "1" logger = logging.getLogger("sphinx") if swh_package_doc_tox_build: # ensure glossary will be available in package doc scope app.connect("source-read", add_glossary_to_index) # suppress some httpdomain warnings in non web packages if not any([pattern in app.srcdir for pattern in ("swh-web", "DWAPPS")]): # filter out httpdomain unresolved reference warnings # to not consider them as errors when using -W option of sphinx-build class HttpDomainRefWarningFilter(logging.Filter): def filter(self, record: logging.LogRecord) -> bool: return not record.msg.startswith("Cannot resolve reference to") # insert a custom filter in the warning log handler of sphinx logger.handlers[1].filters.insert(0, HttpDomainRefWarningFilter())