diff --git a/sysadm/mirror-operations/docker.rst b/sysadm/mirror-operations/docker.rst
--- a/sysadm/mirror-operations/docker.rst
+++ b/sysadm/mirror-operations/docker.rst
@@ -11,8 +11,8 @@
Prerequisities
--------------
-According you have a properly set up docker swarm cluster with support for the
-`docker stack deploy
+We assume that you have a properly set up docker swarm cluster with support for
+the `docker stack deploy
`_ command,
e.g.:
@@ -24,8 +24,9 @@
n9mfw08gys0dmvg5j2bb4j2m7 * host1 Ready Active Leader 18.09.7
-Note: on some systems (centos for example), making docker swarm works require
-some permission tuning regarding the firewall and selinux.
+Note: on some systems (centos for example), making docker swarm work requires
+some permission tuning regarding the firewall and selinux. Please refer to the
+upstream docker-swarm documentation.
In the following how-to, we will assume that the service `STACK` name is `swh`
(this name is the last argument of the `docker stack deploy` command below).
@@ -33,7 +34,7 @@
Several preparation steps will depend on this name.
We also use `docker-compose `_ to merge compose
-files, so make sure it iavailable on your system.
+files, so make sure it is available on your system.
You also need to clone the git repository:
@@ -43,15 +44,15 @@
Set up volumes
--------------
-Before starting the `swh` service, you may want to specify where the data
-should be stored on your docker hosts.
+Before starting the `swh` service, you will certainly want to specify where the
+data should be stored on your docker hosts.
By default docker will use docker volumes for storing databases and the content of
the objstorage (thus put them in `/var/lib/docker/volumes`).
-**Optional:** if you want to specify a different location to put a storage in,
-create the storage before starting the docker service. For example for the
-`objstorage` service you will need a storage named `_objstorage`:
+**Optional:** if you want to specify a different location to put the data in,
+you should create the docker volumes before starting the docker service. For
+example, the `objstorage` service uses a volume named `_objstorage`:
.. code-block:: bash
@@ -65,16 +66,17 @@
If you want to deploy services like the `swh-objstorage` on several hosts, you
will need a shared storage area in which blob objects will be stored. Typically
a NFS storage can be used for this, or any existing docker volume driver like
-`REX-Ray `_. This is not covered in this doc.
+`REX-Ray `_. This is not covered in this
+documentation.
Please read the documentation of docker volumes to learn how to use such a
device/driver as volume provider for docker.
-Note that the provided `base-services.yaml` file have a few placement
-constraints: containers that depends on a volume (db-storage and objstorage)
-are stick to the manager node of the cluster, under the assumption persistent
-volumes have been created on this node. Make sure this fits your needs, or
-amend these placement constraints.
+Note that the provided `base-services.yaml` file has a few placement
+constraints: containers that depend on a volume (db-storage and objstorage) are
+pinned to the manager node of the cluster, under the assumption that persistent
+volumes have been created on this node. You should review that this fits your
+needs, or amend these placement constraints to match your own requirements.
Managing secrets
@@ -85,13 +87,14 @@
Namely, you need to create a `secret` for:
-- `postgres-password`
+- `swh-mirror-db-postgres-password`
+- `swh-mirror-web-postgres-password`
For example:
.. code-block:: bash
- ~/swh-docker$ echo 'strong password' | docker secret create postgres-password -
+ ~/swh-docker$ xkcdpass -a post -d- | docker secret create swh-mirror-db-postgres-password -
[...]
@@ -142,7 +145,7 @@
- an objstorage service,
- a storage service using a postgresql database as backend,
-- a web app front end,
+- a web app front end using a postgresql database as backend,
- a memcache for the web app,
- a prometheus monitoring app,
- a prometeus-statsd exporter,
@@ -163,40 +166,33 @@
the 'latest' docker images work, it is highly recommended to
explicitly specify the version of the image you want to use.
-Docker images for the Software Heritage stack are tagged with their build date:
-
-.. code-block:: bash
-
- ~$ docker images -f reference='softwareheritage/*:20*'
- REPOSITORY TAG IMAGE ID CREATED SIZE
- softwareheritage web-20200819-112604 32ab8340e368 About an hour ago 339MB
- softwareheritage base-20200819-112604 19fe3d7326c5 About an hour ago 242MB
- softwareheritage web-20200630-115021 65b1869175ab 7 weeks ago 342MB
- softwareheritage base-20200630-115021 3694e3fcf530 7 weeks ago 245MB
+Docker images for the Software Heritage stack are tagged with their build date.
+You can check out the list of available tags on the `docker hub page
+`_.
To specify the tag to be used, simply set the SWH_IMAGE_TAG environment variable, like:
.. code-block:: bash
- export SWH_IMAGE_TAG=20200819-112604
- docker deploy -c base-services.yml swh
+ export SWH_IMAGE_TAG=20211022-121751
+ docker stack deploy -c base-services.yml swh
.. warning::
- make sure to have this variable properly set for any later `docker deploy`
- command you type, otherwise you running containers will be recreated using the
- ':latest' image (which might **not** be the latest available version, nor
- consistent amond the docker nodes on you swarm cluster).
+ Please have this variable properly set for any later `docker stack deploy` command
+ you type, otherwise your running containers will be recreated using the ':latest'
+ image (which might **not** be the latest available version, nor consistent among the
+ docker nodes on your swarm cluster).
Updating a configuration
------------------------
-When you modify a configuration file exposed to docker services via the `docker
+Configuration files are exposed to docker services via the `docker
config` system. Unfortunately, docker does not support updating these config
objects, so you need to either:
- destroy the old config before being able to recreate them. That also means
- you need to recreate every docker container using this config, or
+ you need to recreate every docker service using this config, or
- adapt the `name:` field in the compose file.
@@ -220,7 +216,7 @@
Note: since persistent data (databases and objects) are stored in volumes, you
-can safely destoy and recreate any container you want, you will not loose any
+can safely destoy and recreate any container you want, you will not lose any
data.
Or you can change the compose file like:
@@ -306,16 +302,16 @@
Copy these example files as plain yaml ones then modify them to replace
the XXX markers with proper values (also make sure the kafka server list
-is up to date.) Parameters to check/update are:
+is up to date). The parameters to check/update are:
-- `journal_client/brokers`: list of kafka brokers.
-- `journal_client/group_id`: unique identifier for this mirroring session;
+- `journal_client.brokers`: list of kafka brokers.
+- `journal_client.group_id`: unique identifier for this mirroring session;
you can choose whatever you want, but changing this value will make kafka
start consuming messages from the beginning; kafka messages are dispatched
among consumers with the same `group_id`, so in order to distribute the
load among workers, they must share the same `group_id`.
-- `journal_client/sasl.username`: kafka authentication username.
-- `journal_client/sasl.password`: kafka authentication password.
+- `journal_client."sasl.username"`: kafka authentication username.
+- `journal_client."sasl.password"`: kafka authentication password.
Then you need to merge the compose files "by hand" (due to this still
`unresolved `_
@@ -344,13 +340,13 @@
Graph replayer
--------------
-To run the graph replayer compoenent of a mirror:
+To run the graph replayer component of a mirror:
.. code-block:: bash
~/swh-docker$ cd conf
~/swh-docker/conf$ cp graph-replayer.yml.example graph-replayer.yml
- ~/swh-docker/conf$ # edit graph-replayer.yml files
+ ~/swh-docker/conf$ $EDITOR graph-replayer.yml
~/swh-docker/conf$ cd ..
@@ -362,9 +358,9 @@
~/swh-docker$ docker-compose \
-f base-services.yml \
-f graph-replayer-override.yml \
- config > graph-replayer.yml
+ config > stack-with-graph-replayer.yml
~/swh-docker$ docker stack deploy \
- -c graph-replayer.yml \
+ -c stack-with-graph-replayer.yml \
swh-mirror
[...]
@@ -466,14 +462,18 @@
Notes:
+- The overall throughput of the graph replayer will depend heavily on the `swh_storage`
+ service, and on the performance of the underlying `swh_db-storage` database. You will
+ need to make sure that your database is `properly tuned
+ `_.
+
- One graph replayer service requires a steady 500MB to 1GB of RAM to run, so
make sure you have properly sized machines for running these replayer
containers, and to monitor these.
-- The overall bandwidth of the replayer will depend heavily on the
- `swh_storage` service, thus on the `swh_db-storage`. It will require some
- network bandwidth for the ingress kafka payload (this can easily peak to
- several hundreds of Mb/s). So make sure you have a correctly tuned database
- and enough network bw.
+- The graph replayer containers will require sufficient network bandwidth for the kafka
+ traffic (this can easily peak to several hundreds of megabits per second, and the
+ total volume of data fetched will be multiple tens of terabytes).
-- Biggest topics are the directory, revision and content.
+- The biggest kafka topics are directory, revision and content, and will take the
+ longest to initially replay.