diff --git a/requirements.txt b/requirements.txt --- a/requirements.txt +++ b/requirements.txt @@ -6,6 +6,7 @@ sphinxcontrib-images sphinxcontrib-programoutput sphinx-tabs +sphinx-panels sphinx-reredirects sphinx_rtd_theme sphinx-click diff --git a/swh/docs/sphinx/conf.py b/swh/docs/sphinx/conf.py --- a/swh/docs/sphinx/conf.py +++ b/swh/docs/sphinx/conf.py @@ -40,6 +40,7 @@ # swh.scheduler inherits some attribute descriptions from celery that use # custom crossrefs (eg. :setting:`task_ignore_result`) "sphinx_celery.setting_crossref", + "sphinx_panels", ] # Add any paths that contain templates here, relative to this directory. diff --git a/sysadm/mirror-operations/index.rst b/sysadm/mirror-operations/index.rst --- a/sysadm/mirror-operations/index.rst +++ b/sysadm/mirror-operations/index.rst @@ -41,6 +41,10 @@ General view of the |swh| mirroring architecture. +See the :ref:`planning-a-mirror` for a complete description of the requirements +to host a mirror. + + Mirroring the Graph Storage ~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -117,7 +121,7 @@ .. _msgpack: https://msgpack.org -You may also want to read: +You may want to read: - :ref:`mirror_monitor` to learn how to monitor your mirror and how to report its health back the |swh|. @@ -127,6 +131,7 @@ .. toctree:: :hidden: + planning deploy onboard monitor diff --git a/sysadm/mirror-operations/planning.rst b/sysadm/mirror-operations/planning.rst new file mode 100644 --- /dev/null +++ b/sysadm/mirror-operations/planning.rst @@ -0,0 +1,187 @@ +.. _planning-a-mirror: + +Hosting a mirror +================ + +This section present and discuss the technical requirements needed to host a +|SWH| mirror. + +There are many different options to host a mirror, but there are common overall +requirements that needs to be fulfilled. + +Namely, hosting a mirror requires: + +- a dedicated infrastructure with enough compute (s/computing) power and storage +- enough network bandwidth (both ingress and egress) +- good IT tooling (supervision, alerting). + +The mirror operator is not required to run the Software Heritage `full software +stack `_, however it is possible to +use it. + +.. Warning:: + + Volumes given in this section are estimations and numbers from **January + 2022**. + + + +The global raw hardware requirements are: + +- a database system for the main storage of the archive (the graph structure); + the current volume is about 17TB, with an increase rate of about + 280GB/month, +- an object storage system for the objects (archived software source code + files); the current volume is about 800TB with an increase rate of + about 21TB/month, +- an elasticsearch engine; the current main index is about 180M entries + (origins) for an index size of 360GB; the increase rate is about 2M + entries/month, +- a web/application server for the main web application and public API, +- a few compute nodes for the application services. + + +A mirror should provision machines or cloud-based resources with these numbers +in mind. This should include the usual robustness margins (RAID-like storage, +replication, backup etc.). + +General hardware requirements +----------------------------- + +When deploying a mirror based on the Software Heritage software stack, one will +need: + + +Core services +^^^^^^^^^^^^^ + +- a database for the storage; this can be either a + `Postgresql `_ database (single machine) + or a `Cassandra `_ cluster (at least 3 nodes), +- an object storage system; this can be any + :py:mod:`supported backend ` + -- a public cloud-based obstorage (e.g. s3), any private supported object storage, + an ad-hoc filesystem storage system, etc. +- an `elasticsearch `_ instance, +- a few nodes for backend applications + (:py:mod:`swh-storage `, :py:mod:`swh-objstorage `) +- the web frontend (:py:mod:`swh-web `) + serving the main web app and the `public + API `_) + + +Replaying services +^^^^^^^^^^^^^^^^^^ + +- `graph + replayers `_ + as mirroring workers (increase parallelism to increase speed) +- `content + replayers `_ + as mirroring workers (id.) + + +Vault service +^^^^^^^^^^^^^ + +- a node for the :ref:`swh-vault ` backend service, +- a node for the :ref:`swh-vault ` worker service + + +Sizing a mirror infrastructure +------------------------------ + +.. Note:: solutions with a star (*) in the tables below are still under test or + validation. + +Common components +^^^^^^^^^^^^^^^^^ + +================ ====================== ========= ===== ============== ============== +SWH Service Tool Instances RAM Storage Type Storage Volume +================ ====================== ========= ===== ============== ============== +storage swh-storage 16 16GB regular 10GB +search elasticsearch 3 32GB fast / zfs 6TB +web swh-web 1 32GB regular 100GB +---------------- ---------------------- --------- ----- -------------- -------------- +graph replayer swh-storage 32 4GB regular 10GB +content replayer swh-obstorage-replayer 32 4GB regular 10GB +replayer redis 1 8GB regular 100GB +---------------- ---------------------- --------- ----- -------------- -------------- +vault swh-vault 1 4GB regular 10GB +vault worker swh-vault 1 16GB fast 1TB +vault rabbitmq 1 8GB regular 10GB +================ ====================== ========= ===== ============== ============== + + +Storage backend +^^^^^^^^^^^^^^^ + +.. tabbed:: Postgresql + + ================ ====================== ========= ===== ============== ============== + SWH Service Tool Instances RAM Storage Type Storage Volume + ================ ====================== ========= ===== ============== ============== + storage postgresql 1 512GB fast+zfs (lz4) 40TB + ================ ====================== ========= ===== ============== ============== + +.. tabbed:: Cassandra (min.)* + + ================ ====================== ========= ===== ============== ============== + SWH Service Tool Instances RAM Storage Type Storage Volume + ================ ====================== ========= ===== ============== ============== + storage cassandra 3 32GB fast 30TB + ================ ====================== ========= ===== ============== ============== + +.. tabbed:: Cassandra (typ.)* + + ================ ====================== ========= ===== ============== ============== + SWH Service Tool Instances RAM Storage Type Storage Volume + ================ ====================== ========= ===== ============== ============== + storage cassandra 6+ 32GB fast 20TB + ================ ====================== ========= ===== ============== ============== + + +Objstorage backend +^^^^^^^^^^^^^^^^^^ + + +.. tabbed:: FS + + ================ ====================== ========= ===== ============== ============== + SWH Service Tool Instances RAM Storage Type Storage Volume + ================ ====================== ========= ===== ============== ============== + objstorage swh-objstorage 1 [#f1]_ 512GB zfs (with lz4) 1PB + ================ ====================== ========= ===== ============== ============== + +.. tabbed:: Winery - Ceph* + + ================ ====================== ========= ===== ============== ============== + SWH Service Tool Instances RAM Storage Type Storage Volume + ================ ====================== ========= ===== ============== ============== + objstorage swh-objstorage 2 [#f2]_ 32GB standard 100GB + winery-db postgresql 2 [#f2]_ 512GB fast 10TB + ceph-mon ceph 3 4GB fast 60GB + ceph-osd ceph 12+ 4GB mix fast+HDD 1PB (total) + ================ ====================== ========= ===== ============== ============== + +.. tabbed:: Seaweedfs* + + ================ ====================== ========= ===== ============== ============== + SWH Service Tool Instances RAM Storage Type Storage Volume + ================ ====================== ========= ===== ============== ============== + objstorage swh-objstorage 3 32GB standard 100GB + seaweed LB nginx 1 32GB fast 100GB + seaweed-master seaweedfs 3 8GB standard 10GB + seaweed-filer seaweedfs 3 32GB fast 1TB + seaweed-volume seaweedfs 3+ 32GB standard 1PB (total) + ================ ====================== ========= ===== ============== ============== + +.. rubric:: Notes + +.. [#f1] An swh-objstorage using :py:mod:`simple filesystem + ` as backend can actually be + split on several machines using the + :py:mod:`swh.objstorage.multiplexer` backend. +.. [#f2] The swh-objstorage RPC service and the index database can be hosted on + the same machine.