diff --git a/docs/_templates/layout.html b/docs/_templates/layout.html
index 4aafbcf..932a41a 100644
--- a/docs/_templates/layout.html
+++ b/docs/_templates/layout.html
@@ -1,40 +1,61 @@
{% extends "!layout.html" %}
{% block extrahead %}
{% endblock %}
+
+{# copied from sphinx_rtd_theme/layout.html, with the extra System Administration link #}
+{%- block menu %}
+ {%- set toctree = toctree(maxdepth=theme_navigation_depth|int,
+ collapse=theme_collapse_navigation|tobool,
+ includehidden=theme_includehidden|tobool,
+ titles_only=theme_titles_only|tobool) %}
+ {%- if toctree %}
+ {{ toctree }}
+ {%- else %}
+
+
{{ toc }}
+ {%- endif %}
+
+
+
+
+{%- endblock %}
+
{% block content %}
{{ super() }}
{% endblock %}
diff --git a/docs/index.rst b/docs/index.rst
index 40326a1..b4f96da 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -1,214 +1,218 @@
.. _swh-docs:
Software Heritage - Development Documentation
=============================================
Getting started
---------------
* :ref:`getting-started` → deploy a local copy of the Software Heritage
software stack in less than 5 minutes, or
* :ref:`developer-setup` → get a working development setup that allows to hack
on the Software Heritage software stack
* :ref:`faq`
Contributing
------------
* :ref:`patch-submission` → learn how to submit your patches to the
Software Heritage codebase
* :ref:`code-review` → rules and guidelines to review code in
Software Heritage
* :ref:`python-style-guide` → how to format the Python code you write
Architecture
------------
* :ref:`architecture-overview` → get a glimpse of the Software Heritage software
architecture
* :ref:`Metadata workflow ` → learn how Software Heritage
stores and handles metadata
Data Model and Specifications
-----------------------------
* :ref:`persistent-identifiers` Specifications of the SoftWare Heritage persistent IDentifiers (SWHID).
* :ref:`data-model` Documentation of the main |swh| archive data model.
* :ref:`journal-specs` Documentation of the Kafka journal of the |swh| archive.
Tutorials
---------
* :ref:`testing-guide`
* :doc:`/tutorials/issue-debugging-monitoring`
* :ref:`Listing the content of your favorite forge `
and :ref:`running a lister in Docker `
* :ref:`Add a new swh package `
* :ref:`doc-contribution`
Roadmap
-------
* :ref:`roadmap-2021`
-Engineering
------------
-
-* :ref:`Infrastructure `
+System Administration
+---------------------
+
+* :ref:`Network Infrastructure `
+* :ref:`swh-sysadm:mirror` → learn what a Software Heritage mirror is and how to set up
+ one
+* :ref:`Keycloak ` → learn how to use Keycloak,
+ the authentication system used by |swh|'s web interface and public APIs
Components
----------
Here is brief overview of the most relevant software components in the Software
Heritage stack, in alphabetical order.
For a better introduction to the architecture, see the :ref:`architecture-overview`,
which presents each of them in a didactical order.
Each component name is linked to the development documentation
of the corresponding Python module.
:ref:`swh.auth `
low-level library used by modules needing keycloak authentication
:ref:`swh.core `
low-level utilities and helpers used by almost all other modules in the
stack
:ref:`swh.counters `
service providing efficient estimates of the number of objects in the SWH archive,
using Redis's Hyperloglog
:ref:`swh.dataset `
public datasets and periodic data dumps of the archive released by Software
Heritage
:ref:`swh.deposit `
push-based deposit of software artifacts to the archive
swh.docs
developer documentation (used to generate this doc you are reading)
:ref:`swh.fuse `
Virtual file system to browse the Software Heritage archive, based on
`FUSE `_
:ref:`swh.graph `
Fast, compressed, in-memory representation of the archive, with tooling to
generate and query it.
:ref:`swh.indexer `
tools and workers used to crawl the content of the archive and extract
derived information from any artifact stored in it
:ref:`swh.journal `
persistent logger of changes to the archive, with publish-subscribe support
:ref:`swh.lister `
collection of listers for all sorts of source code hosting and distribution
places (forges, distributions, package managers, etc.)
:ref:`swh.loader-core `
low-level loading utilities and helpers used by all other loaders
:ref:`swh.loader-git `
loader for `Git `_ repositories
:ref:`swh.loader-mercurial `
loader for `Mercurial `_ repositories
:ref:`swh.loader-svn `
loader for `Subversion `_ repositories
:ref:`swh.loader-cvs `
loader for `CVS `_ repositories
:ref:`swh.model `
implementation of the :ref:`data-model` to archive source code artifacts
:ref:`swh.objstorage `
content-addressable object storage
:ref:`swh.objstorage.replayer `
Object storage replication tool
:ref:`swh.perfecthash `
Low level management for read-only content-addressable object storage
indexed with a perfect hash table
:ref:`swh.scanner `
source code scanner to analyze code bases and compare them with source code
artifacts archived by Software Heritage
:ref:`swh.scheduler `
task manager for asynchronous/delayed tasks, used for recurrent (e.g.,
listing a forge, loading new stuff from a Git repository) and one-off
activities (e.g., loading a specific version of a source package)
:ref:`swh.search `
search engine for the archive
:ref:`swh.storage `
abstraction layer over the archive, allowing to access all stored source
code artifacts as well as their metadata
:ref:`swh.vault `
implementation of the vault service, allowing to retrieve parts of the
archive as self-contained bundles (e.g., individual releases, entire
repository snapshots, etc.)
:ref:`swh.web `
Web application(s) to browse the archive, for both interactive (HTML UI)
and mechanized (REST API) use
:ref:`swh.web.client `
Python client for :ref:`swh.web `
Dependencies
------------
The dependency relationships among the various modules are depicted below.
.. _py-deps-swh:
.. figure:: images/py-deps-swh.svg
:width: 1024px
:align: center
Dependencies among top-level Python modules (click to zoom).
Archive
-------
* :ref:`Archive ChangeLog `: notable changes to the archive
over time
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* `URLs index `_
* :ref:`search`
* :ref:`glossary`
.. ensure sphinx does not complain about index files not being included
.. toctree::
:maxdepth: 2
:caption: Contents:
:titlesonly:
:hidden:
getting-started/index
architecture/index
contributing/index
tutorials/index
faq/index
roadmap/roadmap-2021
api-reference
archive-changelog
journal
Python modules autodocumentation
diff --git a/swh/docs/sphinx/conf.py b/swh/docs/sphinx/conf.py
index 1c44f88..8a1f874 100755
--- a/swh/docs/sphinx/conf.py
+++ b/swh/docs/sphinx/conf.py
@@ -1,285 +1,286 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
#
import logging
import os
from typing import Dict
from sphinx.ext import autodoc
from swh.docs.django_settings import force_django_settings
# General information about the project.
project = "Software Heritage - Development Documentation"
copyright = "2015-2021 The Software Heritage developers"
author = "The Software Heritage developers"
# -- General configuration ------------------------------------------------
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
"sphinx.ext.autodoc",
"sphinx.ext.napoleon",
"sphinx.ext.intersphinx",
"sphinxcontrib.httpdomain",
"sphinx.ext.extlinks",
"sphinxcontrib.images",
"sphinxcontrib.programoutput",
"sphinx.ext.viewcode",
"sphinx_tabs.tabs",
"sphinx_rtd_theme",
"sphinx.ext.graphviz",
"sphinx_click.ext",
"myst_parser",
"sphinx.ext.todo",
"sphinx_reredirects",
"swh.docs.sphinx.view_in_phabricator",
# swh.scheduler inherits some attribute descriptions from celery that use
# custom crossrefs (eg. :setting:`task_ignore_result`)
"sphinx_celery.setting_crossref",
]
# Add any paths that contain templates here, relative to this directory.
templates_path = ["_templates"]
# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
#
source_suffix = ".rst"
# The master toctree document.
master_doc = "index"
# A string of reStructuredText that will be included at the beginning of every
# source file that is read.
# A bit hackish but should work both for each swh package and the whole swh-doc
rst_prolog = """
.. include:: /../../swh-docs/docs/swh_substitutions
"""
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
#
# The short X.Y version.
version = ""
# The full version, including alpha/beta/rc tags.
release = ""
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = "en"
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This patterns also effect to html_static_path and html_extra_path
exclude_patterns = [
"_build",
"swh-icinga-plugins/index.rst",
"swh.loader.cvs.rcsparse.setup.rst",
"apidoc/swh.loader.cvs.rcsparse.setup.rst",
]
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = "sphinx"
# If true, `todo` and `todoList` produce output, else they produce nothing.
todo_include_todos = True
# -- Options for HTML output ----------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = "sphinx_rtd_theme"
html_favicon = "_static/favicon.ico"
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
#
html_theme_options = {
"collapse_navigation": True,
"sticky_navigation": True,
}
html_logo = "_static/software-heritage-logo-title-motto-vertical-white.png"
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ["_static"]
# make logo actually appear, avoiding gotcha due to alabaster default conf.
# https://github.com/bitprophet/alabaster/issues/97#issuecomment-303722935
html_sidebars = {
"**": [
"about.html",
"globaltoc.html",
"relations.html",
"sourcelink.html",
"searchbox.html",
]
}
# If not None, a 'Last updated on:' timestamp is inserted at every page
# bottom, using the given strftime format.
# The empty string is equivalent to '%b %d, %Y'.
html_last_updated_fmt = "%Y-%m-%d %H:%M:%S %Z"
# refer to the Python standard library.
intersphinx_mapping = {
"python": ("https://docs.python.org/3", None),
"swh-devel": ("https://docs.softwareheritage.org/devel", None),
"swh-sysadm": ("https://docs.softwareheritage.org/sysadm", None),
}
# Redirects for pages that were moved, so we don't break external links.
# Uses sphinx-reredirects
redirects = {
"swh-deposit/spec-api": "api/api-documentation.html",
"swh-deposit/metadata": "api/metadata.html",
"swh-deposit/specs/blueprint": "../api/use-cases.html",
"swh-deposit/user-manual": "api/user-manual.html",
"infrastructure/index.html": "../../sysadm/network-architecture/index.html",
"infrastructure/network.html": "../../sysadm/network-architecture/index.html",
"infrastructure/service-urls.html": "../../sysadm/network-architecture/service-urls.html", # noqa
"architecture": "architecture/overview.html",
+ "architecture/mirror": "../../sysadm/mirror-operations/index.html",
"keycloak": "../../sysadm/user-management/keycloak/index.html",
"mirror": "architecture/mirror.html",
"users": "user",
}
# -- autodoc configuration ----------------------------------------------
autodoc_default_flags = [
"members",
"undoc-members",
"private-members",
"special-members",
]
autodoc_member_order = "bysource"
autodoc_mock_imports = [
"rados",
]
autoclass_content = "both"
modindex_common_prefix = ["swh."]
# For the todo extension. Todo and todolist produce output only if this is True
todo_include_todos = True
_swh_web_base_url = "https://archive.softwareheritage.org"
# for the extlinks extension, sub-projects should fill that dict
extlinks: Dict = {
"swh_web": (f"{_swh_web_base_url}/%s", None),
"swh_web_api": (f"{_swh_web_base_url}/api/1/%s", None),
"swh_web_browse": (f"{_swh_web_base_url}/browse/%s", None),
}
# SWH_PACKAGE_DOC_TOX_BUILD environment variable is set in a tox environment
# named sphinx for each swh package (except the swh-docs package itself).
swh_package_doc_tox_build = os.environ.get("SWH_PACKAGE_DOC_TOX_BUILD", False)
# override some configuration when building a swh package
# documentation with tox to remove warnings and suppress
# those related to unresolved references
if swh_package_doc_tox_build:
swh_substitutions = os.path.join(
os.path.dirname(__file__), "../../../docs/swh_substitutions"
)
rst_prolog = f".. include:: /{swh_substitutions}"
suppress_warnings = ["ref.ref"]
html_favicon = ""
html_logo = ""
class SimpleDocumenter(autodoc.FunctionDocumenter):
"""
Custom autodoc directive to inline the docstring of a function
in a document without the signature header and with no indentation.
Example of use::
.. autosimple:: swh.web.api.views.directory.api_directory
"""
objtype = "simple"
# ensure the priority is lesser than the base FunctionDocumenter
# to avoid side effects with autodoc processing
priority = -1
# do not indent the content
content_indent = ""
# do not add a header to the docstring
def add_directive_header(self, sig):
pass
# sphinx event handler to set adequate django settings prior reading
# apidoc generated rst files when building doc to avoid autodoc errors
def set_django_settings(app, env, docname):
if any([pattern in app.srcdir for pattern in ("swh-web-client", "DWCLI")]):
# swh-web-client is detected as swh-web by the code below but
# django is not installed when building standalone swh-web-client doc
return
package_settings = {
"auth": "swh.auth.tests.django.app.apptest.settings",
"deposit": "swh.deposit.settings.development",
"web": "swh.web.settings.development",
}
for package, settings in package_settings.items():
if any(
[pattern in docname for pattern in (f"swh.{package}", f"swh-{package}")]
):
force_django_settings(settings)
# when building local package documentation with tox, insert glossary
# content at the end of the index file in order to resolve references
# to the terms it contains
def add_glossary_to_index(app, docname, source):
if docname == "index":
glossary_path = os.path.join(
os.path.dirname(__file__), "../../../docs/glossary.rst"
)
with open(glossary_path, "r") as glossary:
source[0] += "\n" + glossary.read()
def setup(app):
# env-purge-doc event is fired before source-read
app.connect("env-purge-doc", set_django_settings)
# add autosimple directive (used in swh-web)
app.add_autodocumenter(SimpleDocumenter)
# set an environment variable indicating we are currently building
# the documentation
os.environ["SWH_DOC_BUILD"] = "1"
logger = logging.getLogger("sphinx")
if swh_package_doc_tox_build:
# ensure glossary will be available in package doc scope
app.connect("source-read", add_glossary_to_index)
# suppress some httpdomain warnings in non web packages
if not any([pattern in app.srcdir for pattern in ("swh-web", "DWAPPS")]):
# filter out httpdomain unresolved reference warnings
# to not consider them as errors when using -W option of sphinx-build
class HttpDomainRefWarningFilter(logging.Filter):
def filter(self, record: logging.LogRecord) -> bool:
return not record.msg.startswith("Cannot resolve reference to")
# insert a custom filter in the warning log handler of sphinx
logger.handlers[1].filters.insert(0, HttpDomainRefWarningFilter())
diff --git a/sysadm/_templates/layout.html b/sysadm/_templates/layout.html
new file mode 100644
index 0000000..7b7eca6
--- /dev/null
+++ b/sysadm/_templates/layout.html
@@ -0,0 +1,19 @@
+{# copied from sphinx_rtd_theme/layout.html, with the extra "Development Documentation" link #}
+{%- block menu %}
+ {%- set toctree = toctree(maxdepth=theme_navigation_depth|int,
+ collapse=theme_collapse_navigation|tobool,
+ includehidden=theme_includehidden|tobool,
+ titles_only=theme_titles_only|tobool) %}
+ {%- if toctree %}
+ {{ toctree }}
+ {%- else %}
+
+ {{ toc }}
+ {%- endif %}
+
+
+
+
+{%- endblock %}
diff --git a/sysadm/mirror-operations/index.rst b/sysadm/mirror-operations/index.rst
index 62ee5a4..2baa555 100644
--- a/sysadm/mirror-operations/index.rst
+++ b/sysadm/mirror-operations/index.rst
@@ -1,130 +1,132 @@
.. _mirror_operations:
Mirror Operations
=================
+.. _mirror:
+
Description
-----------
A mirror is a full copy of the |swh| archive, operated independently from the
Software Heritage initiative. A minimal mirror consists of two parts:
- the graph storage (typically an instance of :ref:`swh.storage `),
which contains the Merkle DAG structure of the archive, *except* the
actual content of source code files (AKA blobs),
- the object storage (typically an instance of :ref:`swh.objstorage `),
which contains all the blobs corresponding to archived source code files.
However, a usable mirror needs also to be accessible by others. As such, a
proper mirror should also allow to:
- navigate the archive copy using a Web browser and/or the Web API (typically
using the :ref:`the web application `),
- retrieve data from the copy of the archive (typically using the :ref:`the
vault service `)
A mirror is initially populated and maintained up-to-date by consuming data
from the |swh| Kafka-based :ref:`journal ` and retrieving the
blob objects (file content) from the |swh| :ref:`object storage `.
.. note:: It is not required that a mirror be deployed using the |swh| software
stack. Other technologies, including different storage methods, can be
used. But we will focus in this documentation to the case of mirror
deployment using the |swh| software stack.
.. thumbnail:: ../images/mirror-architecture.svg
General view of the |swh| mirroring architecture.
Mirroring the Graph Storage
~~~~~~~~~~~~~~~~~~~~~~~~~~~
The replication of the graph is based on a journal using Kafka_ as event
streaming platform.
On the Software Heritage side, every addition made to the archive consist of
the addition of a :ref:`data-model` object. The new object is also serialized
as a msgpack_ bytestring which is used as the value of a message added to a
Kafka topic dedicated to the object type.
The main Kafka topics for the |swh| :ref:`data-model` are:
- `swh.journal.objects.content`
- `swh.journal.objects.directory`
- `swh.journal.objects.extid`
- `swh.journal.objects.metadata_authority`
- `swh.journal.objects.metadata_fetcher`
- `swh.journal.objects.origin_visit_status`
- `swh.journal.objects.origin_visit`
- `swh.journal.objects.origin`
- `swh.journal.objects.raw_extrinsic_metadata`
- `swh.journal.objects.release`
- `swh.journal.objects.revision`
- `swh.journal.objects.skipped_content`
- `swh.journal.objects.snapshot`
In order to set up a mirror of the graph, one needs to deploy a stack capable
of retrieving all these topics and store their content reliably. For example a
Kafka cluster configured as a replica of the main Kafka broker hosted by |swh|
would do the job (albeit not in a very useful manner by itself).
A more useful mirror can be set up using the :ref:`storage `
component with the help of the special service named `replayer` provided by the
:mod:`swh.storage.replay` module.
.. TODO: replace this previous link by a link to the 'swh storage replay'
command once available, and ideally once
https://github.com/sphinx-doc/sphinx/issues/880 is fixed
Mirroring the Object Storage
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
File content (blobs) are *not* directly stored in messages of the
`swh.journal.objects.content` Kafka topic, which only contains metadata about
them, such as various kinds of cryptographic hashes. A separate component is in
charge of replicating blob objects from the archive and stored them in the
local object storage instance.
A separate `swh-journal` client should subscribe to the
`swh.journal.objects.content` topic to get the stream of blob objects
identifiers, then retrieve corresponding blobs from the main Software Heritage
object storage, and store them in the local object storage.
A reference implementation for this component is available in
:ref:`content replayer `.
Installation
------------
When using the |swh| software stack to deploy a mirror, a number of |swh|
software components must be installed (cf. architecture diagram above).
A `docker-swarm `_ based deployment
solution is provided as a working example of the mirror stack,
see :ref:`mirror_deploy`.
It is strongly recommended to start from there before planning a
production-like deployment.
.. _Kafka: https://kafka.apache.org/
.. _msgpack: https://msgpack.org
You may also want to read:
- :ref:`mirror_monitor` to learn how to monitor your mirror and how to report
its health back the |swh|.
- :ref:`mirror_onboard` for the |swh| side view of adding a new mirror.
.. toctree::
:hidden:
deploy
onboard
monitor