diff --git a/docs/index.rst b/docs/index.rst index 4e0c013..3382c24 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -1,158 +1,145 @@ .. _swh-docs: Software Heritage - Development Documentation ============================================= Getting started --------------- * :ref:`getting-started` ← start here to get your own Software Heritage platform running in less than 5 minutes, or * :ref:`developer-setup` ← here to hack on the Software Heritage software stack Architecture ------------ * :ref:`architecture` ← go there to have a glimpse on the Software Heritage software architecture Components ---------- Here is brief overview of the most relevant software components in the Software Heritage stack. Each component name is linked to the development documentation of the corresponding Python module. :ref:`swh.core ` low-level utilities and helpers used by almost all other modules in the stack :ref:`swh.dataset ` public datasets and periodic data dumps of the archive released by Software Heritage :ref:`swh.deposit ` push-based deposit of software artifacts to the archive swh.docs developer documentation (used to generate this doc you are reading) :ref:`swh.graph ` Fast, compressed, in-memory representation of the archive, with tooling to generate and query it. :ref:`swh.indexer ` tools and workers used to crawl the content of the archive and extract derived information from any artifact stored in it :ref:`swh.journal ` persistent logger of changes to the archive, with publish-subscribe support :ref:`swh.lister ` collection of listers for all sorts of source code hosting and distribution places (forges, distributions, package managers, etc.) :ref:`swh.loader-core ` low-level loading utilities and helpers used by all other loaders -:ref:`swh.loader-debian ` - loader for `Debian `_ source packages - -:ref:`swh.loader-dir ` - loader for source directories (e.g., expanded tarballs) - :ref:`swh.loader-git ` loader for `Git `_ repositories :ref:`swh.loader-mercurial ` loader for `Mercurial `_ repositories -:ref:`swh.loader-pypi ` - loader for `PyPI `_ source code releases - :ref:`swh.loader-svn ` loader for `Subversion `_ repositories -:ref:`swh.loader-tar ` - loader for source tarballs (including Tar, ZIP and other archive formats) - :ref:`swh.model ` implementation of the :ref:`data-model` to archive source code artifacts :ref:`swh.objstorage ` content-addressable object storage :ref:`swh.scheduler ` task manager for asynchronous/delayed tasks, used for recurrent (e.g., listing a forge, loading new stuff from a Git repository) and one-off activities (e.g., loading a specific version of a source package) :ref:`swh.storage ` abstraction layer over the archive, allowing to access all stored source code artifacts as well as their metadata :ref:`swh.vault ` implementation of the vault service, allowing to retrieve parts of the archive as self-contained bundles (e.g., individual releases, entire repository snapshots, etc.) :ref:`swh.web ` Web application(s) to browse the archive, for both interactive (HTML UI) and mechanized (REST API) use Dependencies ------------ The dependency relationships among the various modules are depicted below. .. _py-deps-swh: .. figure:: images/py-deps-swh.svg :width: 1024px :align: center Dependencies among top-level Python modules (click to zoom). Indices and tables ================== * :ref:`genindex` * :ref:`modindex` * `URLs index `_ * :ref:`search` * :ref:`glossary` .. ensure sphinx does not complain about index files not being included .. toctree:: :maxdepth: 2 :caption: Contents: :titlesonly: :hidden: architecture getting-started developer-setup - manual-setup Infrastructure API documentation swh.core swh.dataset swh.deposit swh.graph swh.indexer swh.journal swh.lister swh.loader swh.model swh.objstorage swh.scheduler swh.storage swh.vault swh.web diff --git a/docs/infrastructure/object_storage.rst b/docs/infrastructure/object_storage.rst index 413fdec..ee06da6 100644 --- a/docs/infrastructure/object_storage.rst +++ b/docs/infrastructure/object_storage.rst @@ -1,75 +1,76 @@ ============== Object storage ============== There is not one but at least 4 different object stores directly managed by the Software Heritage group: - Main archive - Rocquencourt replica archive - Azure archive - AWS archive The Main archive ================ Uffizi Located in Rocquencourt Replica archive =============== Banco Located in Rocquencourt, in a different building than the main one Azure archive ============= The Azure archive uses an Azure Block Storage backend, implemented in the *swh.objstorage_backends.azure.AzureCloudObjStorage* Python class. Internally, that class uses the *block_blob_service* Azure API. AWS archive =========== The AWS archive is stored in the *softwareheritage* Amazon S3 bucket, in the US-East (N. Virginia) region. That bucket is public. It is being continously populated by the :ref:`content_replayer` program. Softwareheritage Python programs access it using a libcloud backend. URL --- ``s3://softwareheritage/content`` .. _content_replayer: content_replayer ---------------- A Python program which reads new objects from Kafka and then copies them from the object storages on Banco and Uffizi. Implementation details ---------------------- * Uses *swh.objstorage.backends.libcloud* * Uses *libcloud.storage.drivers.s3* Architecture diagram ==================== .. graph:: swh_archives + "Main archive" -- "Replica archive"; "Azure archive"; "AWS archive"; "Main archive" [shape=rectangle]; "Replica archive" [shape=rectangle]; "Azure archive" [shape=rectangle]; "AWS archive" [shape=rectangle]; diff --git a/docs/infrastructure/storage_site_rocquencourt_physical.rst b/docs/infrastructure/storage_site_rocquencourt_physical.rst index 5e9693c..1c4bbc8 100644 --- a/docs/infrastructure/storage_site_rocquencourt_physical.rst +++ b/docs/infrastructure/storage_site_rocquencourt_physical.rst @@ -1,64 +1,64 @@ Physical machines at Rocquencourt ================================= hypervisors ----------- The :doc:`hypervisors ` mostly use local storage on the form of internal -SSDS but also have access to a :ref:`Ceph cluster`. +SSDS but also have access to a :ref:`ceph_cluster`. NFS server ---------- There is only one NFS server managed by Software Heritage, *uffizi.internal.softwareheritage.org*. That machine is located at Rocquencourt and is directly attached to two SAS storage bays. NFS-exported data is present under these local filesystem paths:: /srv/storage/space /srv/softwareheritage/objects belvedere --------- This server is used for at least two separate PostgreSQL instances: - *softwareheritage* database (port 5433) - *swh-lister* and *softwareheritage-scheduler* databases (port 5434) Data is stored on local SSDs. The operating system lies on a LSI hardware RAID 1 volume and each PostgreSQL instance uses a dedicated set of drives in mdadm RAID10 volume(s). It also uses a single NFS volume:: uffizi:/srv/storage/space/postgres-backups/prado banco ----- This machine is located in its own building in Rocquencourt, along with a SAS storage bay. It is intended to serve as a backup for the main site on building 30. Elasticsearch cluster --------------------- The :doc:`Elasticsearch cluster ` only uses local storage on its nodes. Test / staging server --------------------- There is also *orsay*, a refurbished machine only used for testing / staging new software versions. .. _ceph_cluster: Ceph cluster ------------ The Software Heritage Ceph cluster contains three nodes: - ceph-mon1 - ceph-osd1 - ceph-osd2 diff --git a/docs/swh-loader.rst b/docs/swh-loader.rst index 0733fd0..8a1e527 100644 --- a/docs/swh-loader.rst +++ b/docs/swh-loader.rst @@ -1,16 +1,12 @@ Loaders ======= .. toctree:: :maxdepth: 2 :caption: Contents: :titlesonly: swh.loader.core - swh.loader.debian - swh.loader.dir swh.loader.git swh.loader.mercurial - swh.loader.pypi swh.loader.svn - swh.loader.tar diff --git a/swh/docs/sphinx/conf.py b/swh/docs/sphinx/conf.py index aa853d3..48a7e1b 100755 --- a/swh/docs/sphinx/conf.py +++ b/swh/docs/sphinx/conf.py @@ -1,160 +1,160 @@ #!/usr/bin/env python3 # -*- coding: utf-8 -*- # import django import os # General information about the project. project = 'Software Heritage - Development Documentation' copyright = '2015-2019 The Software Heritage developers' author = 'The Software Heritage developers' # -- General configuration ------------------------------------------------ # Add any Sphinx extension module names here, as strings. They can be # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom # ones. extensions = ['sphinx.ext.autodoc', 'sphinx.ext.napoleon', 'sphinxcontrib.httpdomain', 'sphinx.ext.extlinks', 'sphinxcontrib.images', 'sphinxcontrib.programoutput', 'sphinx.ext.viewcode', 'sphinx_tabs.tabs', 'sphinx_rtd_theme', 'sphinx.ext.graphviz', ] # Add any paths that contain templates here, relative to this directory. templates_path = ['_templates'] # The suffix(es) of source filenames. # You can specify multiple suffix as a list of string: # source_suffix = ['.rst', '.md'] # source_suffix = '.rst' source_parsers = { '.md': 'recommonmark.parser.CommonMarkParser', } # The master toctree document. master_doc = 'index' # A string of reStructuredText that will be included at the beginning of every # source file that is read. # A bit hackish but should work both for each swh package and the whole swh-doc rst_prolog = ''' .. include:: /../../swh-docs/docs/swh_substitutions ''' # The version info for the project you're documenting, acts as replacement for # |version| and |release|, also used in various other places throughout the # built documents. # # The short X.Y version. version = '' # The full version, including alpha/beta/rc tags. release = '' # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. # # This is also used if you do content translation via gettext catalogs. # Usually you set "language" from the command line for these cases. language = 'en' # List of patterns, relative to source directory, that match files and # directories to ignore when looking for source files. # This patterns also effect to html_static_path and html_extra_path -exclude_patterns = ['_build'] +exclude_patterns = ['_build', 'swh-icinga-plugins/index.rst', 'swh-search/index.rst'] # The name of the Pygments (syntax highlighting) style to use. pygments_style = 'sphinx' # If true, `todo` and `todoList` produce output, else they produce nothing. todo_include_todos = True # -- Options for HTML output ---------------------------------------------- # The theme to use for HTML and HTML Help pages. See the documentation for # a list of builtin themes. # html_theme = 'sphinx_rtd_theme' html_favicon = '_static/favicon.ico' # Theme options are theme-specific and customize the look and feel of a theme # further. For a list of options available for each theme, see the # documentation. # html_theme_options = { 'collapse_navigation': True, 'sticky_navigation': True, } html_logo = '_static/software-heritage-logo-title-motto-vertical-white.png' # Add any paths that contain custom static files (such as style sheets) here, # relative to this directory. They are copied after the builtin static files, # so a file named "default.css" will overwrite the builtin "default.css". html_static_path = ['_static'] # make logo actually appear, avoiding gotcha due to alabaster default conf. # https://github.com/bitprophet/alabaster/issues/97#issuecomment-303722935 html_sidebars = { '**': [ 'about.html', 'globaltoc.html', 'relations.html', 'sourcelink.html', 'searchbox.html', ] } # If not None, a 'Last updated on:' timestamp is inserted at every page # bottom, using the given strftime format. # The empty string is equivalent to '%b %d, %Y'. html_last_updated_fmt = "%Y-%m-%d %H:%M:%S %Z" # refer to the Python standard library. intersphinx_mapping = {'python': ('https://docs.python.org/3', None)} # -- autodoc configuration ---------------------------------------------- autodoc_default_flags = ['members', 'undoc-members', 'private-members', 'special-members'] autodoc_member_order = 'bysource' autodoc_mock_imports = ['rados'] modindex_common_prefix = ['swh.'] # for the extlinks extension, sub-projects should fill that dict extlinks = {} # XXX Kill this ASA this PR is accepted and released # https://github.com/sphinx-contrib/httpdomain/pull/19 def register_routingtable_as_label(app, document): from sphinx.locale import _ # noqa labels = app.env.domaindata['std']['labels'] labels['routingtable'] = 'http-routingtable', '', _('HTTP Routing Table') anonlabels = app.env.domaindata['std']['anonlabels'] anonlabels['routingtable'] = 'http-routingtable', '' # hack to set the adequate django settings when building global swh doc # to avoid autodoc build errors def setup(app): os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'swh.docs.django_settings') django.setup() import pkg_resources # noqa from distutils.version import StrictVersion # noqa httpdomain = pkg_resources.get_distribution('sphinxcontrib-httpdomain') if StrictVersion(httpdomain.version) <= StrictVersion('1.7.0'): app.connect('doctree-read', register_routingtable_as_label)