diff --git a/docker/README.md b/docker/README.md --- a/docker/README.md +++ b/docker/README.md @@ -584,6 +584,61 @@ (swh) ~/swh-environment$ swh scheduler task respawn 1 ``` +## Data persistence for a development setting + +The default docker-compose.yml configuration is not geared towards data persistence, +but application testing. + +Volumes defined in associated images are anonymous and may get either unused or removed. +Docker and docker-compose documentatioms are not clear about their behavior regarding +anonymous volumes. + +One way to make sure these volume persist is to use named external volumes, created beforehand. +To use vanilla named volumes, fully managed by Docker, create them like this: + +``` +for vn in swh-storage-data swh-objstorage-data; +do + docker volume create "${vn}" +done +``` + +We can also create them as named host volumes so that the data is also accessible +like a non-containerized service (which is not portable). Common considerations +about file ownership and permission apply. Data should be accesible to the user +in the container. + +``` +for vn in swh-storage-data swh-objstorage-data; +do + sudo mkdir "/data/docker/${vn}"; + sudo chown 1000:docker "/data/docker/${vn}"; # 1000 is the uid of "swh" user in the containers + docker volume create -d local --opt type=none --opt o=bind --opt device="/data/docker/${vn}" "${vn}" +done +``` + +Then, the volumes may be defined this way in the docker-compose.yml: + +``` +services: + swh-storage-db: + image: postgres:12 + volumes: + - "swh_storage_data:/var/lib/postgresql/data" + swh-objstorage: + image: swh/stack + volumes: + - "objstorage_data:/srv/softwareheritage/objects" + +volumes: + swh_storage_data: + external: true + swh_objstorage_data: + external: true +``` + +This way, `docker-compose down -v` will not remove those volumes along with the +anonymous ones, only `docker volume rm` will. ## Starting a kafka-powered mirror of the storage