Page Menu
Home
Software Heritage
Search
Configure Global Search
Log In
Files
F9346544
D6980.diff
No One
Temporary
Actions
View File
Edit File
Delete File
View Transforms
Subscribe
Mute Notifications
Award Token
Flag For Later
Size
9 KB
Subscribers
None
D6980.diff
View Options
diff --git a/requirements.txt b/requirements.txt
--- a/requirements.txt
+++ b/requirements.txt
@@ -6,6 +6,7 @@
sphinxcontrib-images
sphinxcontrib-programoutput
sphinx-tabs
+sphinx-panels
sphinx-reredirects
sphinx_rtd_theme
sphinx-click
diff --git a/swh/docs/sphinx/conf.py b/swh/docs/sphinx/conf.py
--- a/swh/docs/sphinx/conf.py
+++ b/swh/docs/sphinx/conf.py
@@ -40,6 +40,7 @@
# swh.scheduler inherits some attribute descriptions from celery that use
# custom crossrefs (eg. :setting:`task_ignore_result`)
"sphinx_celery.setting_crossref",
+ "sphinx_panels",
]
# Add any paths that contain templates here, relative to this directory.
diff --git a/sysadm/mirror-operations/index.rst b/sysadm/mirror-operations/index.rst
--- a/sysadm/mirror-operations/index.rst
+++ b/sysadm/mirror-operations/index.rst
@@ -41,6 +41,10 @@
General view of the |swh| mirroring architecture.
+See the :ref:`planning-a-mirror` for a complete description of the requirements
+to host a mirror.
+
+
Mirroring the Graph Storage
~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -117,7 +121,7 @@
.. _msgpack: https://msgpack.org
-You may also want to read:
+You may want to read:
- :ref:`mirror_monitor` to learn how to monitor your mirror and how to report
its health back the |swh|.
@@ -127,6 +131,7 @@
.. toctree::
:hidden:
+ planning
deploy
onboard
monitor
diff --git a/sysadm/mirror-operations/planning.rst b/sysadm/mirror-operations/planning.rst
new file mode 100644
--- /dev/null
+++ b/sysadm/mirror-operations/planning.rst
@@ -0,0 +1,187 @@
+.. _planning-a-mirror:
+
+Hosting a mirror
+================
+
+This section present and discuss the technical requirements needed to host a
+|SWH| mirror.
+
+There are many different options to host a mirror, but there are common overall
+requirements that needs to be fulfilled.
+
+Namely, hosting a mirror requires:
+
+- a dedicated infrastructure with enough compute (s/computing) power and storage
+- enough network bandwidth (both ingress and egress)
+- good IT tooling (supervision, alerting).
+
+The mirror operator is not required to run the Software Heritage `full software
+stack <https://docs.softwareheritage.org/devel>`_, however it is possible to
+use it.
+
+.. Warning::
+
+ Volumes given in this section are estimations and numbers from **January
+ 2022**.
+
+
+
+The global raw hardware requirements are:
+
+- a database system for the main storage of the archive (the graph structure);
+ the current volume is about 17TB, with an increase rate of about
+ 280GB/month,
+- an object storage system for the objects (archived software source code
+ files); the current volume is about 800TB with an increase rate of
+ about 21TB/month,
+- an elasticsearch engine; the current main index is about 180M entries
+ (origins) for an index size of 360GB; the increase rate is about 2M
+ entries/month,
+- a web/application server for the main web application and public API,
+- a few compute nodes for the application services.
+
+
+A mirror should provision machines or cloud-based resources with these numbers
+in mind. This should include the usual robustness margins (RAID-like storage,
+replication, backup etc.).
+
+General hardware requirements
+-----------------------------
+
+When deploying a mirror based on the Software Heritage software stack, one will
+need:
+
+
+Core services
+^^^^^^^^^^^^^
+
+- a database for the storage; this can be either a
+ `Postgresql <https://postgresql.org>`_ database (single machine)
+ or a `Cassandra <https://cassandra.apache.org>`_ cluster (at least 3 nodes),
+- an object storage system; this can be any
+ :py:mod:`supported backend <swh.objstorage.backends>`
+ -- a public cloud-based obstorage (e.g. s3), any private supported object storage,
+ an ad-hoc filesystem storage system, etc.
+- an `elasticsearch <https://www.elastic.co>`_ instance,
+- a few nodes for backend applications
+ (:py:mod:`swh-storage <swh.storage>`, :py:mod:`swh-objstorage <swh.objstorage>`)
+- the web frontend (:py:mod:`swh-web <swh.web>`)
+ serving the main web app and the `public
+ API <https://docs.softwareheritage.org/devel/swh-web/uri-scheme-api.html>`_)
+
+
+Replaying services
+^^^^^^^^^^^^^^^^^^
+
+- `graph
+ replayers <https://docs.softwareheritage.org/devel/swh-storage/cli.html#swh-storage-replay>`_
+ as mirroring workers (increase parallelism to increase speed)
+- `content
+ replayers <https://docs.softwareheritage.org/devel/swh-objstorage-replayer/cli.html>`_
+ as mirroring workers (id.)
+
+
+Vault service
+^^^^^^^^^^^^^
+
+- a node for the :ref:`swh-vault <swh-vault>` backend service,
+- a node for the :ref:`swh-vault <swh-vault>` worker service
+
+
+Sizing a mirror infrastructure
+------------------------------
+
+.. Note:: solutions with a star (*) in the tables below are still under test or
+ validation.
+
+Common components
+^^^^^^^^^^^^^^^^^
+
+================ ====================== ========= ===== ============== ==============
+SWH Service Tool Instances RAM Storage Type Storage Volume
+================ ====================== ========= ===== ============== ==============
+storage swh-storage 16 16GB regular 10GB
+search elasticsearch 3 32GB fast / zfs 6TB
+web swh-web 1 32GB regular 100GB
+---------------- ---------------------- --------- ----- -------------- --------------
+graph replayer swh-storage 32 4GB regular 10GB
+content replayer swh-obstorage-replayer 32 4GB regular 10GB
+replayer redis 1 8GB regular 100GB
+---------------- ---------------------- --------- ----- -------------- --------------
+vault swh-vault 1 4GB regular 10GB
+vault worker swh-vault 1 16GB fast 1TB
+vault rabbitmq 1 8GB regular 10GB
+================ ====================== ========= ===== ============== ==============
+
+
+Storage backend
+^^^^^^^^^^^^^^^
+
+.. tabbed:: Postgresql
+
+ ================ ====================== ========= ===== ============== ==============
+ SWH Service Tool Instances RAM Storage Type Storage Volume
+ ================ ====================== ========= ===== ============== ==============
+ storage postgresql 1 512GB fast+zfs (lz4) 40TB
+ ================ ====================== ========= ===== ============== ==============
+
+.. tabbed:: Cassandra (min.)*
+
+ ================ ====================== ========= ===== ============== ==============
+ SWH Service Tool Instances RAM Storage Type Storage Volume
+ ================ ====================== ========= ===== ============== ==============
+ storage cassandra 3 32GB fast 30TB
+ ================ ====================== ========= ===== ============== ==============
+
+.. tabbed:: Cassandra (typ.)*
+
+ ================ ====================== ========= ===== ============== ==============
+ SWH Service Tool Instances RAM Storage Type Storage Volume
+ ================ ====================== ========= ===== ============== ==============
+ storage cassandra 6+ 32GB fast 20TB
+ ================ ====================== ========= ===== ============== ==============
+
+
+Objstorage backend
+^^^^^^^^^^^^^^^^^^
+
+
+.. tabbed:: FS
+
+ ================ ====================== ========= ===== ============== ==============
+ SWH Service Tool Instances RAM Storage Type Storage Volume
+ ================ ====================== ========= ===== ============== ==============
+ objstorage swh-objstorage 1 [#f1]_ 512GB zfs (with lz4) 1PB
+ ================ ====================== ========= ===== ============== ==============
+
+.. tabbed:: Winery - Ceph*
+
+ ================ ====================== ========= ===== ============== ==============
+ SWH Service Tool Instances RAM Storage Type Storage Volume
+ ================ ====================== ========= ===== ============== ==============
+ objstorage swh-objstorage 2 [#f2]_ 32GB standard 100GB
+ winery-db postgresql 2 [#f2]_ 512GB fast 10TB
+ ceph-mon ceph 3 4GB fast 60GB
+ ceph-osd ceph 12+ 4GB mix fast+HDD 1PB (total)
+ ================ ====================== ========= ===== ============== ==============
+
+.. tabbed:: Seaweedfs*
+
+ ================ ====================== ========= ===== ============== ==============
+ SWH Service Tool Instances RAM Storage Type Storage Volume
+ ================ ====================== ========= ===== ============== ==============
+ objstorage swh-objstorage 3 32GB standard 100GB
+ seaweed LB nginx 1 32GB fast 100GB
+ seaweed-master seaweedfs 3 8GB standard 10GB
+ seaweed-filer seaweedfs 3 32GB fast 1TB
+ seaweed-volume seaweedfs 3+ 32GB standard 1PB (total)
+ ================ ====================== ========= ===== ============== ==============
+
+.. rubric:: Notes
+
+.. [#f1] An swh-objstorage using :py:mod:`simple filesystem
+ <swh.objstorage.backends.pathslicing>` as backend can actually be
+ split on several machines using the
+ :py:mod:`swh.objstorage.multiplexer` backend.
+.. [#f2] The swh-objstorage RPC service and the index database can be hosted on
+ the same machine.
File Metadata
Details
Attached
Mime Type
text/plain
Expires
Thu, Jul 3, 4:09 PM (2 w, 1 d ago)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3222276
Attached To
D6980: Add an "Hosting a mirror" page
Event Timeline
Log In to Comment