diff --git a/README.md b/README.md index 2d44d02..e94f256 100644 --- a/README.md +++ b/README.md @@ -1,107 +1,107 @@ swh-docs ======== This module contains (the logics for generating) the Software Heritage development documentation. Specifically, it contains some general information about Software Heritage internals (stuff that would not fit in any other specific software component of the Software Heritage stack) and bundle them together component-specific documentation coming from other modules of the stack. All documentation is written and typeset using [Sphinx][1]. General documentation is shipped as part of this module. Module-specific documentation is centralized here via symlinks to the `docs/` dirs of individual modules. Therefore to build the full documentation you need a working and complete [Software Heritage development environment][2]. How to build the doc -------------------- Install the [Software Heritage development environment][2] $ git clone https://forge.softwareheritage.org/source/swh-environment $ cd swh-environment $ ./bin/update # this will clone needed git repos, inc. swh-docs $ cd swh-docs -Ensure you have the required tools to generate images ([graphviz][3]'s `dot`, +Ensure you have the required tools to generate images ([graphviz][3]'s `dot`, [plantuml][4] and [inkscape][5]). On a Debian system: $ sudo apt install plantuml graphviz These additional packages are required on Debian 10.x systems: - libapr1-dev - libaprutil1-dev - libsvn-dev - postgresql-11 - dia - postgresql-autodoc It is also recommended to build the doc using [tox][6], so ensure you have it installed, eg. on a Debian system: $ sudo apt install tox Then (from the `swh-environment/swh-docs/` directory): $ tox -e sphinx-dev This tox environment will build the documentation from the sources available in the parent directory (`swh-environment`). Behind the scene, this tox environment will run the sphinx documentation building process via [pifpaf][7] to encapsulate the need os Postgresql to generate database schemas. The documentation building process itself consists mainly in 3 steps: ### 1. Generate documentation assets for all modules $ cd swh-environment $ make docs-assets This will *not* build the documentation in each module (there is `make docs` for that). ### 2. Build the api docs for all swh python packages $ cd swh-docs/docs $ make apidoc ### 3. Build the documentation $ cd swh-docs/docs $ make The HTML documentation is now available starting from `_build/html/index.html`. Cleaning up ----------- $ cd docs $ make distclean The former (`make clean`) will only clean the local Sphinx build, without touching other modules. The latter (`make distclean`) will also clean Sphinx builds in all other modules. Publishing the doc ------------------ The publication of the documentation is now managed by the [CI][7]. [1]: http://www.sphinx-doc.org/ [2]: https://forge.softwareheritage.org/source/swh-environment/ [3]: https://graphviz.org [4]: http://plantuml.com [5]: https://inkscape.org/ [6]: https://tox.readthedocs.io/ [7]: https://github.com/jd/pifpaf diff --git a/docs/conf.py b/docs/conf.py index 927c6b2..c0f1b90 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -1,9 +1,10 @@ # flake8: noqa import swh.docs.sphinx.conf as sphinx_conf # swh-web needs to add some extra sphinx settings import swh.web.doc_config as swh_web_doc_config + swh_web_doc_config.customize_sphinx_conf(sphinx_conf) from swh.docs.sphinx.conf import * diff --git a/docs/images/general-architecture.svg b/docs/images/general-architecture.svg index 635043f..cc4b9b4 100644 --- a/docs/images/general-architecture.svg +++ b/docs/images/general-architecture.svg @@ -1,3374 +1,3374 @@ image/svg+xml + id="original" /> Web App Scheduler API Deposit API Vault API Indexer API Storage API ObjStorage API Object Storage Journal Lister Celery Broker Scheduler listener workers Scheduler runner Loader Indexer workers workers diff --git a/docs/infrastructure/object_storage.rst b/docs/infrastructure/object_storage.rst index ee06da6..ab82e8e 100644 --- a/docs/infrastructure/object_storage.rst +++ b/docs/infrastructure/object_storage.rst @@ -1,76 +1,76 @@ ============== Object storage ============== There is not one but at least 4 different object stores directly managed by the Software Heritage group: - Main archive - Rocquencourt replica archive - Azure archive - AWS archive The Main archive ================ Uffizi Located in Rocquencourt Replica archive =============== Banco Located in Rocquencourt, in a different building than the main one Azure archive ============= The Azure archive uses an Azure Block Storage backend, implemented in the *swh.objstorage_backends.azure.AzureCloudObjStorage* Python class. Internally, that class uses the *block_blob_service* Azure API. AWS archive =========== The AWS archive is stored in the *softwareheritage* Amazon S3 bucket, in the US-East (N. Virginia) region. That bucket is public. -It is being continously populated by the :ref:`content_replayer` program. +It is being continuously populated by the :ref:`content_replayer` program. Softwareheritage Python programs access it using a libcloud backend. URL --- ``s3://softwareheritage/content`` .. _content_replayer: content_replayer ---------------- A Python program which reads new objects from Kafka and then copies them from the object storages on Banco and Uffizi. Implementation details ---------------------- * Uses *swh.objstorage.backends.libcloud* * Uses *libcloud.storage.drivers.s3* Architecture diagram ==================== .. graph:: swh_archives "Main archive" -- "Replica archive"; "Azure archive"; "AWS archive"; "Main archive" [shape=rectangle]; "Replica archive" [shape=rectangle]; "Azure archive" [shape=rectangle]; "AWS archive" [shape=rectangle];