diff --git a/sysadm/mirror-operations/docker.rst b/sysadm/mirror-operations/docker.rst --- a/sysadm/mirror-operations/docker.rst +++ b/sysadm/mirror-operations/docker.rst @@ -71,10 +71,17 @@ Please read the documentation of docker volumes to learn how to use such a device/driver as volume provider for docker. -Note that the provided :file:`base-services.yaml` file has placement constraints for the -``db-storage``, ``db-web`` and ``objstorage`` containers, that depend on the availability of -specific volumes (respectively ``_storage-db``, ``_web-db`` and -``_objstorage``). These services are pinned to specific nodes using labels named + +Node labels +----------- + +Note that the provided :file:`base-services.yaml` file has label-based +placement constraints for several services. + +The ``db-storage``, ``db-web``, ``objstorage`` and ``redis`` containers, which +depend on the availability of specific volumes (respectively +``_storage-db``, ``_web-db`` and ``_objstorage``) are +pinned to specific nodes using labels named ``org.softwareheritage.mirror.volumes.`` (e.g. ``org.softwareheritage.mirror.volumes.objstorage``). @@ -90,6 +97,44 @@ You have to set the node labels, or to adapt the placement constraints to your local requirements, for the services to start. +The monitoring services, ``prometheus``, ``prometheus-statsd-exporter`` and +``grafana`` also have placement constraints based on the label +``org.softwareheritage.mirror.monitoring``. So make sure to add this label to +one (and only one) node of the cluster: + +.. code-block:: bash + + docker node update \ + --label-add org.softwareheritage.mirror.monitoring=true \ + + +To check labels defined on a specific node, one can use the ``docker node +inspect`` command: + +.. code-block:: bash + + docker node inspect \ + -f '{{ .ID }} [{{ .Description.Hostname}}]: \ + {{ range $k, $v := .Spec.Labels }}{{ $k }}={{ $v }} + {{end}}' + +Labels that need to be defined are: + +- ``org.softwareheritage.mirror.volumes.objstorage=true``: node that will host + the objstorage service, on which the ``swh_objstorage`` volume must be + defined. + +- ``org.softwareheritage.mirror.volumes.redis=true``: node that will host the + redis service on which the ``swh_redis`` volume must be defined. + +- ``org.softwareheritage.mirror.volumes.storage-db=true``: node that will host + the swh-storage Postgresql database, on which the ``swh_storage-db`` volume must + be defined. + +- ``org.softwareheritage.mirror.volumes.web-db=true``: node that will host the + swh-web Postgresql database, on which the ``swh_web-db`` must be defined. + + Managing secrets ---------------- @@ -136,40 +181,57 @@ ~/swh-docker$ export SWH_IMAGE_TAG=20211022-121751 -You can then spawn the base services using the following command: +Make sure you have node labels attributed properly. Then you can spawn the +base services using the following command: .. code-block:: bash - ~/swh-docker$ docker stack deploy -c base-services.yml swh + ~/swh-docker$ docker stack deploy -c mirror.yml swh Creating network swh_default + Creating config swh_content-replayer + Creating config swh_grafana-provisioning-datasources-prometheus + Creating config swh_graph-replayer + Creating config swh_grafana-provisioning-dashboards-all + Creating config swh_grafana-dashboards-content-replayer + Creating config swh_grafana-dashboards-backend-stats + Creating config swh_prometheus + Creating config swh_prometheus-statsd-exporter Creating config swh_storage - Creating config swh_objstorage Creating config swh_nginx Creating config swh_web - Creating service swh_grafana - Creating service swh_prometheus-statsd-exporter + Creating config swh_grafana-dashboards-graph-replayer + Creating config swh_objstorage + Creating service swh_storage + Creating service swh_redis + Creating service swh_content-replayer + Creating service swh_nginx + Creating service swh_prometheus Creating service swh_web + Creating service swh_prometheus-statsd-exporter + Creating service swh_db-web Creating service swh_objstorage Creating service swh_db-storage + Creating service swh_graph-replayer Creating service swh_memcache - Creating service swh_storage - Creating service swh_nginx - Creating service swh_prometheus + Creating service swh_grafana ~/swh-docker$ docker service ls - ID NAME MODE REPLICAS IMAGE PORTS - tc93talbe2tg swh_db-storage global 1/1 postgres:13 - 42q5jtxsh029 swh_db-web global 1/1 postgres:13 - rtlz62ok6s96 swh_grafana replicated 1/1 grafana/grafana:latest - jao3rt0et17n swh_memcache replicated 1/1 memcached:latest - rulxakqgu2ko swh_nginx replicated 1/1 nginx:latest *:5081->5081/tcp - q560pvw3q3ls swh_objstorage replicated 2/2 softwareheritage/base:20211022-121751 - a2h3ltaqdt56 swh_prometheus global 1/1 prom/prometheus:latest - lm24et9gjn2k swh_prometheus-statsd-exporter replicated 1/1 prom/statsd-exporter:latest - gwqinrao5win swh_storage replicated 2/2 softwareheritage/base:20211022-121751 - 7g46blmphfb4 swh_web replicated 1/1 softwareheritage/web:20211022-121751 + ID NAME MODE REPLICAS IMAGE PORTS + ptlhzue025zm swh_content-replayer replicated 0/0 softwareheritage/replayer:20220225-101454 + ycyanvhh0jnt swh_db-storage replicated 1/1 (max 1 per node) postgres:13 + qlaf9tcyimz7 swh_db-web replicated 1/1 (max 1 per node) postgres:13 + aouw9j8uovr2 swh_grafana replicated 1/1 (max 1 per node) grafana/grafana:latest + uwqe13udgyqt swh_graph-replayer replicated 0/0 softwareheritage/replayer:20220225-101454 + mepbxllcxctu swh_memcache replicated 1/1 memcached:latest + kfzirv0h298h swh_nginx global 3/3 nginx:latest *:5081->5081/tcp + t7med8frg9pr swh_objstorage replicated 2/2 softwareheritage/base:20220225-101454 + 5s34wzo29ukl swh_prometheus replicated 1/1 (max 1 per node) prom/prometheus:latest + rwom7r3yv5ql swh_prometheus-statsd-exporter replicated 1/1 (max 1 per node) prom/statsd-exporter:latest + wuwydthechea swh_redis replicated 1/1 (max 1 per node) redis:6.2.6 + jztolbmjp1vi swh_storage replicated 2/2 softwareheritage/base:20220225-101454 + xxc4c66x0uj1 swh_web replicated 1/1 softwareheritage/web:20220225-101454 This will start a series of containers with: @@ -182,6 +244,9 @@ - a prometeus-statsd exporter, - a grafana server, - an nginx server serving as reverse proxy for grafana and swh-web. +- a swh_content-replayer service (initially set to 0 replica, see below) +- a swh_graph-replayer service (initially set to 0 replica, see below) +- a redis for the replication error logs, using the pinned version of the docker images. @@ -275,7 +340,6 @@ Note that this will reset the replicas config to their default values. - If you want to update only a specific service, you can also use (here for a replayer service): @@ -285,6 +349,20 @@ softwareheritage/replayer:${SWH_IMAGE_TAG} \ swh_graph-replayer +.. warning:: + + Updating the image of a storage service may come with a database migration + script. So we strongly recommend you scale the service back to one before + updating the image: + + .. code-block:: bash + + ~/swh-docker$ docker service scale swh_storage=1 + ~/swh-docker$ docker service update --image \ + softwareheritage/base:${SWH_IMAGE_TAG} \ + swh_storage + ~/swh-docker$ docker service scale swh_storage=16 + Set up the mirroring components =============================== @@ -299,13 +377,18 @@ - the ``content replayer`` which is in charge of filling the object storage. -Examples of docker deploy files and configuration files are provided in -the :file:`graph-replayer.yml` deploy file for replayer services -using configuration from yaml files in :file:`conf/graph-replayer.yml`. +The examples docker deploy file ``mirror.yml`` already define these 2 +services, but they are not deployed by default (their ``replicas`` is set to +0). This allows to first deploy core components and check they are properly +started and running. + +To start the replayers, first their configuration files need to be adjusted to +your setup. -Copy these example files as plain yaml ones then modify them to replace -the XXX markers with proper values (also make sure the kafka server list -is up to date). The parameters to check/update are: +Edit the provided example files ``conf/graph-replayer.yml`` and +``conf/content-replayer.yml`` to modify fields with an XXX markers with proper +values (also make sure the kafka server list is up to date). The parameters to +check/update are: - ``journal_client.brokers``: list of kafka brokers. - ``journal_client.group_id``: unique identifier for this mirroring session; @@ -316,81 +399,44 @@ - ``journal_client.sasl.username``: kafka authentication username. - ``journal_client.sasl.password``: kafka authentication password. -Then you need to merge the compose files "by hand" (due to this still -`unresolved `_ -`bugs `_). For this we will use -`docker compose `_ as helper tool to merge the -compose files. - -To merge 2 (or more) compose files together, typically :file:`base-services.yml` with -a mirror-related file: +Then you need to update the configuration, as described above: .. code-block:: bash - ~/swh-docker$ docker-compose \ - -f base-services.yml \ - -f graph-replayer-override.yml \ - config > mirror.yml - + ~/swh-docker$ docker config create swh_graph-replayer-2 conf/graph-replayer.yml + ~/swh-docker$ docker service update \ + --config-rm swh_graph-replayer \ + --config-add source=swh_graph-replayer-2,target=/etc/softwareheritage/config.yml \ + swh_graph-replayer -Then use this generated file as argument of the :command:`docker stack deploy` -command, e.g.: +and .. code-block:: bash - ~/swh-docker$ docker stack deploy -c mirror.yml swh + ~/swh-docker$ docker config create swh_content-replayer-2 conf/content-replayer.yml + ~/swh-docker$ docker service update \ + --config-rm swh_content-replayer \ + --config-add source=swh_content-replayer-2,target=/etc/softwareheritage/config.yml \ + swh_content-replayer Graph replayer -------------- -To run the graph replayer component of a mirror: +To run the graph replayer component of a mirror is just a matter of scaling its service: .. code-block:: bash - ~/swh-docker$ cd conf - ~/swh-docker/conf$ cp graph-replayer.yml.example graph-replayer.yml - ~/swh-docker/conf$ $EDITOR graph-replayer.yml - ~/swh-docker/conf$ cd .. - - -Once you have properly edited the :file:`conf/graph-replayer.yml` config file, -you can start these services with: - -.. code-block:: bash - - ~/swh-docker$ docker-compose \ - -f base-services.yml \ - -f graph-replayer-override.yml \ - config > stack-with-graph-replayer.yml - ~/swh-docker$ docker stack deploy \ - -c stack-with-graph-replayer.yml \ - swh - [...] + ~/swh-docker$ docker service scale swh_graph-replayer=1 You can check everything is running with: .. code-block:: bash - ~/swh-docker$ docker stack ls + ~/swh-docker$ docker service ps swh_graph-replayer - NAME SERVICES ORCHESTRATOR - swh 11 Swarm - - ~/swh-docker$ docker service ls - - ID NAME MODE REPLICAS IMAGE PORTS - tc93talbe2tg swh_db-storage global 1/1 postgres:13 - 42q5jtxsh029 swh_db-web global 1/1 postgres:13 - rtlz62ok6s96 swh_grafana replicated 1/1 grafana/grafana:latest - 7hvn66um77wr swh_graph-replayer replicated 4/4 softwareheritage/replayer:20211022-121751 - jao3rt0et17n swh_memcache replicated 1/1 memcached:latest - rulxakqgu2ko swh_nginx replicated 1/1 nginx:latest *:5081->5081/tcp - q560pvw3q3ls swh_objstorage replicated 2/2 softwareheritage/base:20211022-121751 - a2h3ltaqdt56 swh_prometheus global 1/1 prom/prometheus:latest - lm24et9gjn2k swh_prometheus-statsd-exporter replicated 1/1 prom/statsd-exporter:latest - gwqinrao5win swh_storage replicated 2/2 softwareheritage/base:20211022-121751 - 7g46blmphfb4 swh_web replicated 1/1 softwareheritage/web:20211022-121751 + ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS + ioyt34ok118a swh_graph-replayer.1 softwareheritage/replayer:20220225-101454 node1 Running Running 17 minutes ago If everything is OK, you should have your mirror filling. Check docker logs: @@ -415,43 +461,7 @@ .. code-block:: bash - ~/swh-docker$ cd conf - ~/swh-docker/conf$ cp content-replayer.yml.example content-replayer.yml - ~/swh-docker/conf$ # edit content-replayer.yml files - ~/swh-docker/conf$ cd .. - - -Once you have properly edited the :file:`conf/content-replayer.yml` config file, you can -start these services with: - -.. code-block:: bash - - ~/swh-docker$ docker-compose \ - -f base-services.yml \ - -f content-replayer-override.yml \ - config > content-replayer.yml - ~/swh-docker$ docker stack deploy \ - -c content-replayer.yml \ - swh - [...] - - -Full mirror ------------ - -Putting all together is just a matter of merging the 3 compose files: - -.. code-block:: bash - - ~/swh-docker$ docker-compose \ - -f base-services.yml \ - -f graph-replayer-override.yml \ - -f content-replayer-override.yml \ - config > mirror.yml - ~/swh-docker$ docker stack deploy \ - -c mirror.yml \ - swh - [...] + ~/swh-docker$ docker service scale swh_content-replayer=1 Getting your deployment production-ready @@ -460,16 +470,27 @@ docker-stack scaling -------------------- -In order to scale up a replayer service, you can use the :command:`docker -scale` command. For example: +Once the replayer services have been checked, started and are working +properly, you can increase the replication to speed up the replication process. .. code-block:: bash - ~/swh-docker$ docker service scale swh_graph-replayer=4 - [...] + ~/swh-docker$ docker service scale swh_graph-replayer=64 + ~/swh-docker$ docker service scale swh_content-replayer=64 +A proper replication factor value will depend on your infrastructure +capabilities and needs to be adjusted watching the load of the core services +(mainly the swh_storage-db and swh_objstorage services). -will start 4 copies of the graph replayer service. +Acceptable range should be between 32 to 64 (for staging) or 256 (for production). + +Note that when you increase the replication of the replayers, you also need to +increase the replication factor for the core services ``swh_storage`` and +``swh_objstorage`` otherwise they will become the limiting factor of the replaying +process. A factor of 4 between the number of replayer of a type (graph, +content) and the backend service (swh_storage, swh_objstorage) is probably a +good starting point (i.e. have at least one core service for 4 replayer +services). You may have to play a bit with these values to find the right balance. Notes on the throughput of the mirroring process ------------------------------------------------ @@ -502,3 +523,19 @@ external database server - override the environment variables of the ``storage`` service to reference the external database server and dbname + + +Operational concerns for the monitoring +--------------------------------------- + +You may want to use a prometheus server running directly on one of the docker +swarm nodes so that it can easily also monitor the swarm cluster itself and the +running docker services. + +See the `prometheus guide `_ on +how to configure a Prometheus server to monitor a docker swarm cluster. + +In this case, the ``prometheus`` service should be removed from the docker +deploy compose file, and the configuration files should be updated accordingly. +You would probably want to move ``grafana`` from the docker swarm, and rework +the ``prometheus-statsd-exporter`` node setup accordingly.