diff --git a/sysadm/mirror-operations/docker.rst b/sysadm/mirror-operations/docker.rst --- a/sysadm/mirror-operations/docker.rst +++ b/sysadm/mirror-operations/docker.rst @@ -11,8 +11,8 @@ Prerequisities -------------- -According you have a properly set up docker swarm cluster with support for the -`docker stack deploy +We assume that you have a properly set up docker swarm cluster with support for +the `docker stack deploy `_ command, e.g.: @@ -24,8 +24,9 @@ n9mfw08gys0dmvg5j2bb4j2m7 * host1 Ready Active Leader 18.09.7 -Note: on some systems (centos for example), making docker swarm works require -some permission tuning regarding the firewall and selinux. +Note: on some systems (centos for example), making docker swarm work requires +some permission tuning regarding the firewall and selinux. Please refer to the +upstream docker-swarm documentation. In the following how-to, we will assume that the service `STACK` name is `swh` (this name is the last argument of the `docker stack deploy` command below). @@ -33,7 +34,7 @@ Several preparation steps will depend on this name. We also use `docker-compose `_ to merge compose -files, so make sure it iavailable on your system. +files, so make sure it is available on your system. You also need to clone the git repository: @@ -43,15 +44,15 @@ Set up volumes -------------- -Before starting the `swh` service, you may want to specify where the data -should be stored on your docker hosts. +Before starting the `swh` service, you will certainly want to specify where the +data should be stored on your docker hosts. By default docker will use docker volumes for storing databases and the content of the objstorage (thus put them in `/var/lib/docker/volumes`). -**Optional:** if you want to specify a different location to put a storage in, -create the storage before starting the docker service. For example for the -`objstorage` service you will need a storage named `_objstorage`: +**Optional:** if you want to specify a different location to put the data in, +you should create the docker volumes before starting the docker service. For +example, the `objstorage` service uses a volume named `_objstorage`: .. code-block:: bash @@ -65,16 +66,17 @@ If you want to deploy services like the `swh-objstorage` on several hosts, you will need a shared storage area in which blob objects will be stored. Typically a NFS storage can be used for this, or any existing docker volume driver like -`REX-Ray `_. This is not covered in this doc. +`REX-Ray `_. This is not covered in this +documentation. Please read the documentation of docker volumes to learn how to use such a device/driver as volume provider for docker. -Note that the provided `base-services.yaml` file have a few placement -constraints: containers that depends on a volume (db-storage and objstorage) -are stick to the manager node of the cluster, under the assumption persistent -volumes have been created on this node. Make sure this fits your needs, or -amend these placement constraints. +Note that the provided `base-services.yaml` file has a few placement +constraints: containers that depend on a volume (db-storage and objstorage) are +pinned to the manager node of the cluster, under the assumption that persistent +volumes have been created on this node. You should review that this fits your +needs, or amend these placement constraints to match your own requirements. Managing secrets @@ -85,13 +87,14 @@ Namely, you need to create a `secret` for: -- `postgres-password` +- `swh-mirror-db-postgres-password` +- `swh-mirror-web-postgres-password` For example: .. code-block:: bash - ~/swh-docker$ echo 'strong password' | docker secret create postgres-password - + ~/swh-docker$ xkcdpass -a post -d- | docker secret create swh-mirror-db-postgres-password - [...] @@ -142,7 +145,7 @@ - an objstorage service, - a storage service using a postgresql database as backend, -- a web app front end, +- a web app front end using a postgresql database as backend, - a memcache for the web app, - a prometheus monitoring app, - a prometeus-statsd exporter, @@ -163,40 +166,33 @@ the 'latest' docker images work, it is highly recommended to explicitly specify the version of the image you want to use. -Docker images for the Software Heritage stack are tagged with their build date: - -.. code-block:: bash - - ~$ docker images -f reference='softwareheritage/*:20*' - REPOSITORY TAG IMAGE ID CREATED SIZE - softwareheritage web-20200819-112604 32ab8340e368 About an hour ago 339MB - softwareheritage base-20200819-112604 19fe3d7326c5 About an hour ago 242MB - softwareheritage web-20200630-115021 65b1869175ab 7 weeks ago 342MB - softwareheritage base-20200630-115021 3694e3fcf530 7 weeks ago 245MB +Docker images for the Software Heritage stack are tagged with their build date. +You can check out the list of available tags on the `docker hub page +`_. To specify the tag to be used, simply set the SWH_IMAGE_TAG environment variable, like: .. code-block:: bash - export SWH_IMAGE_TAG=20200819-112604 - docker deploy -c base-services.yml swh + export SWH_IMAGE_TAG=20211022-121751 + docker stack deploy -c base-services.yml swh .. warning:: - make sure to have this variable properly set for any later `docker deploy` - command you type, otherwise you running containers will be recreated using the - ':latest' image (which might **not** be the latest available version, nor - consistent amond the docker nodes on you swarm cluster). + Please have this variable properly set for any later `docker stack deploy` command + you type, otherwise your running containers will be recreated using the ':latest' + image (which might **not** be the latest available version, nor consistent among the + docker nodes on your swarm cluster). Updating a configuration ------------------------ -When you modify a configuration file exposed to docker services via the `docker +Configuration files are exposed to docker services via the `docker config` system. Unfortunately, docker does not support updating these config objects, so you need to either: - destroy the old config before being able to recreate them. That also means - you need to recreate every docker container using this config, or + you need to recreate every docker service using this config, or - adapt the `name:` field in the compose file. @@ -220,7 +216,7 @@ Note: since persistent data (databases and objects) are stored in volumes, you -can safely destoy and recreate any container you want, you will not loose any +can safely destoy and recreate any container you want, you will not lose any data. Or you can change the compose file like: @@ -306,16 +302,16 @@ Copy these example files as plain yaml ones then modify them to replace the XXX markers with proper values (also make sure the kafka server list -is up to date.) Parameters to check/update are: +is up to date). The parameters to check/update are: -- `journal_client/brokers`: list of kafka brokers. -- `journal_client/group_id`: unique identifier for this mirroring session; +- `journal_client.brokers`: list of kafka brokers. +- `journal_client.group_id`: unique identifier for this mirroring session; you can choose whatever you want, but changing this value will make kafka start consuming messages from the beginning; kafka messages are dispatched among consumers with the same `group_id`, so in order to distribute the load among workers, they must share the same `group_id`. -- `journal_client/sasl.username`: kafka authentication username. -- `journal_client/sasl.password`: kafka authentication password. +- `journal_client."sasl.username"`: kafka authentication username. +- `journal_client."sasl.password"`: kafka authentication password. Then you need to merge the compose files "by hand" (due to this still `unresolved `_ @@ -344,13 +340,13 @@ Graph replayer -------------- -To run the graph replayer compoenent of a mirror: +To run the graph replayer component of a mirror: .. code-block:: bash ~/swh-docker$ cd conf ~/swh-docker/conf$ cp graph-replayer.yml.example graph-replayer.yml - ~/swh-docker/conf$ # edit graph-replayer.yml files + ~/swh-docker/conf$ $EDITOR graph-replayer.yml ~/swh-docker/conf$ cd .. @@ -362,9 +358,9 @@ ~/swh-docker$ docker-compose \ -f base-services.yml \ -f graph-replayer-override.yml \ - config > graph-replayer.yml + config > stack-with-graph-replayer.yml ~/swh-docker$ docker stack deploy \ - -c graph-replayer.yml \ + -c stack-with-graph-replayer.yml \ swh-mirror [...] @@ -466,14 +462,18 @@ Notes: +- The overall throughput of the graph replayer will depend heavily on the `swh_storage` + service, and on the performance of the underlying `swh_db-storage` database. You will + need to make sure that your database is `properly tuned + `_. + - One graph replayer service requires a steady 500MB to 1GB of RAM to run, so make sure you have properly sized machines for running these replayer containers, and to monitor these. -- The overall bandwidth of the replayer will depend heavily on the - `swh_storage` service, thus on the `swh_db-storage`. It will require some - network bandwidth for the ingress kafka payload (this can easily peak to - several hundreds of Mb/s). So make sure you have a correctly tuned database - and enough network bw. +- The graph replayer containers will require sufficient network bandwidth for the kafka + traffic (this can easily peak to several hundreds of megabits per second, and the + total volume of data fetched will be multiple tens of terabytes). -- Biggest topics are the directory, revision and content. +- The biggest kafka topics are directory, revision and content, and will take the + longest to initially replay.