diff --git a/docker/README.rst b/docker/README.rst
--- a/docker/README.rst
+++ b/docker/README.rst
@@ -5,146 +5,17 @@
instance on development machines. The end goal is to smooth the
contributors/developers workflow. Focus on coding, not configuring!
-.. warning::
- Running a Software Heritage instance on your machine can
- consume quite a bit of resources: if you play a bit too hard (e.g., if
- you try to list all GitHub repositories with the corresponding lister),
- you may fill your hard drive, and consume a lot of CPU, memory and
- network bandwidth.
-
Dependencies
------------
-This uses docker with docker-compose, so ensure you have a working
-docker environment and docker-compose is installed.
+This uses Docker with `Compose`_, so ensure you have a working
+Docker environment and that the `docker compose plugin is installed `_.
We recommend using the latest version of docker, so please read
https://docs.docker.com/install/linux/docker-ce/debian/ for more details
-on how to install docker on your machine.
-
-On a debian system, docker-compose can be installed from Debian
-repositories::
-
- ~$ sudo apt install docker-compose
-
-Quick start
------------
-
-First, change to the docker dir if you aren’t there yet::
-
- ~$ cd swh-environment/docker
-
-Then, start containers::
-
- ~/swh-environment/docker$ docker-compose up -d
- [...]
- Creating docker_amqp_1 ... done
- Creating docker_zookeeper_1 ... done
- Creating docker_kafka_1 ... done
- Creating docker_flower_1 ... done
- Creating docker_swh-scheduler-db_1 ... done
- [...]
-
-This will build docker images and run them. Check everything is running
-fine with::
-
- ~/swh-environment/docker$ docker-compose ps
- Name Command State Ports
- -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
- docker_amqp_1 docker-entrypoint.sh rabbi ... Up 15671/tcp, 0.0.0.0:5018->15672/tcp, 25672/tcp, 4369/tcp, 5671/tcp, 5672/tcp
- docker_flower_1 flower --broker=amqp://gue ... Up 0.0.0.0:5555->5555/tcp
- docker_kafka_1 start-kafka.sh Up 0.0.0.0:5092->5092/tcp
- docker_swh-deposit-db_1 docker-entrypoint.sh postgres Up 5432/tcp
- docker_swh-deposit_1 /entrypoint.sh Up 0.0.0.0:5006->5006/tcp
- [...]
-
-The startup of some containers may fail the first time for
-dependency-related problems. If some containers failed to start, just
-run the ``docker-compose up -d`` command again.
-
-If a container really refuses to start properly, you can check why using
-the ``docker-compose logs`` command. For example::
-
- ~/swh-environment/docker$ docker-compose logs swh-lister
- Attaching to docker_swh-lister_1
- [...]
- swh-lister_1 | Processing /src/swh-scheduler
- swh-lister_1 | Could not install packages due to an EnvironmentError: [('/src/swh-scheduler/.hypothesis/unicodedata/8.0.0/charmap.json.gz', '/tmp/pip-req-build-pm7nsax3/.hypothesis/unicodedata/8.0.0/charmap.json.gz', "[Errno 13] Permission denied: '/src/swh-scheduler/.hypothesis/unicodedata/8.0.0/charmap.json.gz'")]
- swh-lister_1 |
-
-Once all containers are running, you can use the web interface by
-opening http://localhost:5080/ in your web browser.
-
-At this point, the archive is empty and needs to be filled with some
-content. To do so, you can create tasks that will scrape a forge. For
-example, to inject the code from the https://0xacab.org gitlab forge::
-
- ~/swh-environment/docker$ docker-compose exec swh-scheduler \
- swh scheduler task add list-gitlab-full \
- -p oneshot url=https://0xacab.org/api/v4
-
- Created 1 tasks
-
- Task 1
- Next run: just now (2018-12-19 14:58:49+00:00)
- Interval: 90 days, 0:00:00
- Type: list-gitlab-full
- Policy: oneshot
- Args:
- Keyword args:
- url=https://0xacab.org/api/v4
-
-This task will scrape the forge’s project list and register origins to the scheduler.
-This takes at most a couple of minutes.
-
-Then, you must tell the scheduler to create loading tasks for these origins.
-For example, to create tasks for 100 of these origins::
-
- ~/swh-environment/docker$ docker-compose exec swh-scheduler \
- swh scheduler origin schedule-next git 100
-
-This will take a bit of time to complete.
-
-To increase the speed at which git repositories are imported, you can
-spawn more ``swh-loader-git`` workers::
-
- ~/swh-environment/docker$ docker-compose exec swh-scheduler \
- celery status
- listers@50ac2185c6c9: OK
- loader@b164f9055637: OK
- indexer@33bc6067a5b8: OK
- vault@c9fef1bbfdc1: OK
-
- 4 nodes online.
- ~/swh-environment/docker$ docker-compose exec swh-scheduler \
- celery control pool_grow 3 -d loader@b164f9055637
- -> loader@b164f9055637: OK
- pool will grow
- ~/swh-environment/docker$ docker-compose exec swh-scheduler \
- celery inspect -d loader@b164f9055637 stats | grep prefetch_count
- "prefetch_count": 4
-
-Now there are 4 workers ingesting git repositories. You can also
-increase the number of ``swh-loader-git`` containers::
-
- ~/swh-environment/docker$ docker-compose up -d --scale swh-loader=4
- [...]
- Creating docker_swh-loader_2 ... done
- Creating docker_swh-loader_3 ... done
- Creating docker_swh-loader_4 ... done
-
-Updating the docker image
--------------------------
-
-All containers started by ``docker-compose`` are bound to a docker image
-named ``swh/stack`` including all the software components of Software
-Heritage. When new versions of these components are released, the docker
-image will not be automatically updated. In order to update all Software
-Heritage components to their latest version, the docker image needs to
-be explicitly rebuilt by issuing the following command from within the
-``docker`` directory::
+on how to install Docker on your machine.
- ~/swh-environment/docker$ docker build --no-cache -t swh/stack .
+.. _Compose: https://docs.docker.com/compose/
Details
-------
@@ -210,7 +81,7 @@
To run the same command from within a container::
- ~/swh-environment/docker$ docker-compose exec swh-scheduler celery status
+ ~/swh-environment/docker$ docker compose exec swh-scheduler celery status
loader@61704103668c: OK
[...]
@@ -276,7 +147,7 @@
For example, to add a (one shot) task that will list git repos on the
0xacab.org gitlab instance, one can do (from this git repository)::
- ~/swh-environment/docker$ docker-compose exec swh-scheduler \
+ ~/swh-environment/docker$ docker compose exec swh-scheduler \
swh scheduler task add list-gitlab-full \
-p oneshot url=https://0xacab.org/api/v4
@@ -294,7 +165,7 @@
This will insert a new task in the scheduler. To list existing tasks for
a given task type::
- ~/swh-environment/docker$ docker-compose exec swh-scheduler \
+ ~/swh-environment/docker$ docker compose exec swh-scheduler \
swh scheduler task list-pending list-gitlab-full
Found 1 list-gitlab-full tasks
@@ -310,7 +181,7 @@
To list all existing task types::
- ~/swh-environment/docker$ docker-compose exec swh-scheduler \
+ ~/swh-environment/docker$ docker compose exec swh-scheduler \
swh scheduler task-type list
Known task types:
@@ -371,7 +242,7 @@
debian lister task not being properly registered on the
swh-scheduler-runner service)::
- ~/swh-environment/docker$ docker-compose logs --tail=10 swh-scheduler-runner
+ ~/swh-environment/docker$ docker compose logs --tail=10 swh-scheduler-runner
Attaching to docker_swh-scheduler-runner_1
swh-scheduler-runner_1 | "__main__", mod_spec)
swh-scheduler-runner_1 | File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code
@@ -415,7 +286,8 @@
It is possible to run a docker container with some swh packages
installed from sources instead of using the latest published packages
-from pypi. To do this you must write a docker-compose override file
+from pypi. To do this you must write a
+`Docker Compose override file `_
(``docker-compose.override.yml``). An example is given in the
``docker-compose.override.yml.example`` file:
@@ -429,7 +301,7 @@
- "$HOME/swh-environment/swh-objstorage:/src/swh-objstorage"
The file named ``docker-compose.override.yml`` will automatically be
-loaded by ``docker-compose``.
+loaded by Docker Compose.
This example shows the simplest case of the ``swh-objstorage`` package:
you just have to mount it in the container in ``/src`` and the
@@ -502,7 +374,7 @@
export BROKER_URL=amqp://127.0.0.1:5072/
export APP=swh.scheduler.celery_backend.config.app
export COMPOSE_FILE=~/swh-environment/docker/docker-compose.yml:~/swh-environment/docker/docker-compose.override.yml
- alias doco=docker-compose
+ alias doco="docker compose"
EOF
@@ -520,9 +392,9 @@
(see the `documentation of the celery command
`_),
- - ``COMPOSE_FILE`` so you can run ``docker-compose`` from everywhere,
+ - ``COMPOSE_FILE`` so you can run ``docker compose`` from everywhere,
-- create an alias ``doco`` for ``docker-compose`` because this is way
+- create an alias ``doco`` for ``docker compose`` because this is way
too long to type,
So now you can easily:
@@ -583,7 +455,7 @@
data persistence, but application testing.
Volumes defined in associated images are anonymous and may get either
-unused or removed on the next ``docker-compose up``.
+unused or removed on the next ``docker compose up``.
One way to make sure these volumes persist is to use named volumes. The
volumes may be defined as follows in a ``docker-compose.override.yml``.
@@ -604,7 +476,7 @@
swh_storage_data:
swh_objstorage_data:
-This way, ``docker-compose down`` without the ``-v`` flag will not
+This way, ``docker compose down`` without the ``-v`` flag will not
remove those volumes and data will persist.
@@ -627,7 +499,7 @@
This can be used like::
- ~/swh-environment/docker$ docker-compose \
+ ~/swh-environment/docker$ docker compose \
-f docker-compose.yml \
-f docker-compose.storage-mirror.yml \
up -d
@@ -661,7 +533,7 @@
::
- ~/swh-environment/docker$ docker-compose \
+ ~/swh-environment/docker$ docker compose \
-f docker-compose.yml \
-f docker-compose.storage-mirror.yml \
-f docker-compose.storage-mirror.override.yml \
@@ -680,7 +552,7 @@
This can be used like::
- ~/swh-environment/docker$ docker-compose \
+ ~/swh-environment/docker$ docker compose \
-f docker-compose.yml \
-f docker-compose.cassandra.yml \
up -d
@@ -699,7 +571,7 @@
Instead, you can enable swh-search, which is based on ElasticSearch
and much more efficient, like this::
- ~/swh-environment/docker$ docker-compose \
+ ~/swh-environment/docker$ docker compose \
-f docker-compose.yml \
-f docker-compose.search.yml \
up -d
@@ -719,7 +591,7 @@
So we have an alternative based on Redis' HyperLogLog feature, which you
can test with::
- ~/swh-environment/docker$ docker-compose \
+ ~/swh-environment/docker$ docker compose \
-f docker-compose.yml \
-f docker-compose.counters.yml \
up -d
@@ -737,7 +609,7 @@
You can use it with::
- ~/swh-environment/docker$ docker-compose \
+ ~/swh-environment/docker$ docker compose \
-f docker-compose.yml \
-f docker-compose.graph.yml up -d
@@ -754,15 +626,15 @@
Then, you need to explicitly request recomputing the graph before restarts
if you want to update it::
- ~/swh-environment/docker$ docker-compose \
+ ~/swh-environment/docker$ docker compose \
-f docker-compose.yml \
-f docker-compose.graph.yml \
run swh-graph update
- ~/swh-environment/docker$ docker-compose \
+ ~/swh-environment/docker$ docker compose \
-f docker-compose.yml \
-f docker-compose.graph.yml \
stop swh-graph
- ~/swh-environment/docker$ docker-compose \
+ ~/swh-environment/docker$ docker compose \
-f docker-compose.yml \
-f docker-compose.graph.yml \
up -d swh-graph
@@ -775,7 +647,7 @@
you will need to enable Keycloak as well, instead of the default
Django-based authentication::
- ~/swh-environment/docker$ docker-compose -f docker-compose.yml -f docker-compose.keycloak.yml up -d
+ ~/swh-environment/docker$ docker compose -f docker-compose.yml -f docker-compose.keycloak.yml up -d
[...]
User registration in Keycloak database is available by following the Register link
@@ -794,14 +666,14 @@
""""""""""""""""""""""""""""""
As mentioned above, it is possible to consume topics from the kafka server available
-in the docker-compose environment from the host using `127.0.0.1:5092` as broker URL.
+in the Docker Compose environment from the host using `127.0.0.1:5092` as broker URL.
Resetting offsets
"""""""""""""""""
It is also possible to reset a consumer group offset using the following command::
- ~swh-environment/docker$ docker-compose \
+ ~swh-environment/docker$ docker compose \
run kafka kafka-consumer-groups.sh \
--bootstrap-server kafka:9092 \
--group \
@@ -816,7 +688,7 @@
You can get information on consumer groups::
- ~swh-environment/docker$ docker-compose \
+ ~swh-environment/docker$ docker compose \
run kafka kafka-consumer-groups.sh \
--bootstrap-server kafka:9092 \
--describe --members --all-groups
@@ -824,7 +696,7 @@
Or the stored offsets for all (or a given) groups::
- ~swh-environment/docker$ docker-compose \
+ ~swh-environment/docker$ docker compose \
run kafka kafka-consumer-groups.sh \
--bootstrap-server kafka:9092 \
--describe --offsets --all-groups
@@ -857,7 +729,7 @@
Also note that for the ``swh-objstorage``, since the volume can be
pretty big, the remove operation can be quite long (several minutes is
-not uncommon), which may mess a bit with the ``docker-compose`` command.
+not uncommon), which may mess a bit with the ``docker compose`` command.
If you have an error message like: