README.md
No OneTemporary
Actions

Size

9 KB

Subscribers

None

README.md
View Options

	# Deploy a Software Heritage stack with docker deploy

	According you have a properly set up docker swarm cluster with support for the
	`docker deploy` command, e.g.:

	```
	~/swh-docker$ docker node ls
	ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
	py47518uzdb94y2sb5yjurj22 host2 Ready Active 18.09.7
	n9mfw08gys0dmvg5j2bb4j2m7 * host1 Ready Active Leader 18.09.7
	```

	Note: this might require you activate experimental features of docker as
	described in [docker deploy](https://docs.docker.com/engine/reference/commandline/deploy/)
	documentation.

	In the following how-to, we will assume that the service `STACK` name is `swh`
	(this name is the last argument of the `docker deploy` command below).

	Several preparation steps will depend on this name.

	## Set up volumes

	Before starting the `swh` service, you may want to specify where the data
	should be stored on your docker hosts.

	By default it will use docker volumes for storing databases and the content of
	the objstorage (thus put them in `/var/lib/docker/volumes`.

	If you want to specify a different location to put a storage in, create the
	storage before starting the docker service. For example for the `objstorage`
	service you will need a storage named `<STACK>_objstorage`:

	```
	~/swh-docker$ docker volume create -d local \
	--opt type=none \
	--opt o=bind \
	--opt device=/data/docker/swh-objstorage \
	swh_objstorage
	```

	If you want to deploy services like the `swh-objstorage` on several hosts, you
	will a shared storage area in which blob objects will be stored. Typically a
	NFS storage can be used for this. This is not covered in this doc.

	Please read the documentation of docker volumes to learn how to use such a
	device as volume proviver for docker.

	Note that the provided `docker-compose.yaml` file have a few placement
	constraints, for example the `objstorage` service is forced to be spawn on the
	master node of the docker swarm cluster. Feel free to remove/amend these
	constraints if needed.

	## Managing secrets

	Shared passwords (between services) are managed via `docker secret`. Before
	being able to start services, you need to define these secrets.

	Namely, you need to create a `secret` for:

	- `postgres-password`

	For example:
	```
	~/swh-docker$ echo 'strong password' \| docker secret create postgres-password -
	[...]
	```

	## Creating the swh service

	From within this repository, just type:

	```
	~/swh-docker$ docker deploy -c docker-compose.yml swh
	Creating service swh_web
	Creating service swh_objstorage
	Creating service swh_storage
	Creating service swh_nginx
	Creating service swh_memcache
	Creating service swh_db-storage
	~/swh-docker$ docker service ls
	ID NAME MODE REPLICAS IMAGE PORTS
	bkn2bmnapx7w swh_db-storage replicated 1/1 postgres:11
	2ujcw3dg8f9d swh_memcache replicated 1/1 memcached:latest
	l52hxxl61ijj swh_nginx replicated 1/1 nginx:latest *:5080->80/tcp
	3okk2njpbopx swh_objstorage replicated 1/1 softwareheritage/base:latest
	zais9ey62weu swh_storage replicated 1/1 softwareheritage/base:latest
	7sm6g5ecff19 swh_web replicated 1/1 softwareheritage/web:latest
	```

	This will start a series of containers with:

	- an objstorage service,
	- a storage service using a postgresql database as backend,
	- a web app front end,
	- a memcache for the web app,
	- an nginx server serving as reverse proxy for the swh-web instances.


	## Updating a configuration

	When you modify a configuration file exposed to docker services via the `docker
	config` system, you need to destroy the old config before being able to
	recreate them (docker is currently not capable of updating an existing config.)
	Unfortunately that also means you need to recreate every docker container using
	this config.

	For example, if you edit the file `conf/storage.yml`:

	```
	~/swh-docker$ docker service rm swh_storage
	swh_storage
	~/swh-docker$ docker config rm swh_storage
	swh_storage
	~/swh-docker$ docker deploy -c docker-compose.yml swh
	Creating config swh_storage
	Creating service swh_storage
	Updating service swh_nginx (id: l52hxxl61ijjxnj9wg6ddpaef)
	Updating service swh_memcache (id: 2ujcw3dg8f9dm4r6qmgy0sb1e)
	Updating service swh_db-storage (id: bkn2bmnapx7wgvwxepume71k1)
	Updating service swh_web (id: 7sm6g5ecff1979t0jd3dmsvwz)
	Updating service swh_objstorage (id: 3okk2njpbopxso3n3w44ydyf9)
	```

	## Updating a service

	When a new version of the softwareheritage/base image is published, running
	services must updated to use it.

	In order to prevent inconsistency caveats due to dependency in deployed
	versions, we recommend that you shut the tail services off (especially the
	replayer services in case of a mirror stack).

	This can be done as follow:

	```
	docker service update --image \
	$(docker inspect -f '{{index .RepoDigests 0}}' \
	softwareheritage/base:latest ) \
	swh_graph-replayer-origin
	```

	# Set up a mirror

	A Software Heritage mirror consists in base Software Heritage services, as
	described above without any worker related to web scraping nor source code
	repository loading. Instead, filling local storage and objstorage is the
	responsibility of kafka based `replayer` services:

	- the `graph replayer` which is in charge of filling the storage (aka the
	graph), and

	- the `content replayer` which is in charge of filling the object storage.

	Ensure configuration files are properly set in `conf/graph-replayer.yml` and
	`conf/content-replayer.yml`, then you can start these services with:

	```
	~/swh-docker$ docker deploy -c docker-compose.yml,docker-compose-mirror.yml swh
	[...]
	```
	You can check everything is running with:

	```
	~/swh-docker$ docker ls
	ID NAME MODE REPLICAS IMAGE PORTS
	88djaq3jezjm swh_db-storage replicated 1/1 postgres:11
	m66q36jb00xm swh_grafana replicated 1/1 grafana/grafana:latest
	qfsxngh4s2sv swh_content-replayer replicated 1/1 softwareheritage/base:latest
	qcl0n3ngr2uv swh_graph-replayer-content replicated 2/2 softwareheritage/base:latest
	f1hop14w6b9h swh_graph-replayer-directory replicated 4/4 softwareheritage/base:latest
	dcpvbf7h4fja swh_graph-replayer-origin replicated 2/2 softwareheritage/base:latest
	1njy5iuugmk2 swh_graph-replayer-release replicated 2/2 softwareheritage/base:latest
	cbe600nl9bdb swh_graph-replayer-revision replicated 4/4 softwareheritage/base:latest
	5hroiithan6c swh_graph-replayer-snapshot replicated 2/2 softwareheritage/base:latest
	zn8dzsron3y7 swh_memcache replicated 1/1 memcached:latest
	wfbvf3yk6t41 swh_nginx replicated 1/1 nginx:latest *:5081->5081/tcp
	thtev7o0n6th swh_objstorage replicated 1/1 softwareheritage/base:latest
	ysgdoqshgd2k swh_prometheus replicated 1/1 prom/prometheus:latest
	u2mjjl91aebz swh_prometheus-statsd-exporter replicated 1/1 prom/statsd-exporter:latest
	xyf2xgt465ob swh_storage replicated 1/1 softwareheritage/base:latest
	su8eka2b5cbf swh_web replicated 1/1 softwareheritage/web:latest
	```


	If everything is OK, you should have your mirror filling. Check docker logs:

	```
	~/swh-docker$ docker service logs swh_content-replayer
	[...]
	```

	and:

	```
	~/swh-docker$ docker service logs swh_graph-replayer-directory
	[...]
	```

	## Scaling up services

	In order to scale up a replayer service, you can use the `docker scale` command. For example:

	```
	~/swh-docker$ docker service scale swh_graph-replayer-directory=4
	[...]
	```

	will start 4 copies of the directory replayer service.

	Notes:

	- One graph replayer service requires a steady 500MB to 1GB of RAM to run, so
	make sure you have properly sized machines for running these replayer
	containers, and to monitor these.

	- The overall bandwidth of the replayer will depend heavily on the
	`swh_storage` service, thus on the `swh_db-storage`. It will require some
	network bandwidth for the ingress kafka payload (this can easily peak to
	several hundreds of Mb/s). So make sure you have a correctly tuned database
	and enough network bw.

	- Biggest topics are the directory, content and revision.

File Metadata

Mime Type: text/plain
Expires: Jun 4 2025, 7:10 PM (9 w, 4 d ago)
Storage Engine: blob
Storage Format: Raw Data
Storage Handle: 3245679

README.mdNo OneTemporaryActions

README.mdView Options

File Metadata

Event Timeline

README.md
No OneTemporary
Actions

README.md
View Options