Changeset View
Changeset View
Standalone View
Standalone View
docker/README.rst
Docker environment | Docker environment | ||||||||||
================== | ================== | ||||||||||
``swh-environment/docker/`` contains Dockerfiles to run a small Software Heritage | ``swh-environment/docker/`` contains Dockerfiles to run a small Software Heritage | ||||||||||
instance on development machines. The end goal is to smooth the | instance on development machines. The end goal is to smooth the | ||||||||||
contributors/developers workflow. Focus on coding, not configuring! | contributors/developers workflow. Focus on coding, not configuring! | ||||||||||
.. warning:: | |||||||||||
Running a Software Heritage instance on your machine can | |||||||||||
consume quite a bit of resources: if you play a bit too hard (e.g., if | |||||||||||
you try to list all GitHub repositories with the corresponding lister), | |||||||||||
you may fill your hard drive, and consume a lot of CPU, memory and | |||||||||||
network bandwidth. | |||||||||||
Dependencies | Dependencies | ||||||||||
------------ | ------------ | ||||||||||
This uses docker with docker-compose, so ensure you have a working | This uses Docker with `Compose`_, so ensure you have a working | ||||||||||
docker environment and docker-compose is installed. | Docker environment and that the `docker compose plugin is installed <https://docs.docker.com/compose/install/>`_. | ||||||||||
We recommend using the latest version of docker, so please read | We recommend using the latest version of docker, so please read | ||||||||||
https://docs.docker.com/install/linux/docker-ce/debian/ for more details | https://docs.docker.com/install/linux/docker-ce/debian/ for more details | ||||||||||
on how to install docker on your machine. | on how to install Docker on your machine. | ||||||||||
On a debian system, docker-compose can be installed from Debian | |||||||||||
repositories:: | |||||||||||
~$ sudo apt install docker-compose | |||||||||||
Quick start | |||||||||||
----------- | |||||||||||
First, change to the docker dir if you aren’t there yet:: | |||||||||||
~$ cd swh-environment/docker | |||||||||||
Then, start containers:: | |||||||||||
~/swh-environment/docker$ docker-compose up -d | |||||||||||
[...] | |||||||||||
Creating docker_amqp_1 ... done | |||||||||||
Creating docker_zookeeper_1 ... done | |||||||||||
Creating docker_kafka_1 ... done | |||||||||||
Creating docker_flower_1 ... done | |||||||||||
Creating docker_swh-scheduler-db_1 ... done | |||||||||||
[...] | |||||||||||
This will build docker images and run them. Check everything is running | |||||||||||
fine with:: | |||||||||||
~/swh-environment/docker$ docker-compose ps | |||||||||||
Name Command State Ports | |||||||||||
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | |||||||||||
docker_amqp_1 docker-entrypoint.sh rabbi ... Up 15671/tcp, 0.0.0.0:5018->15672/tcp, 25672/tcp, 4369/tcp, 5671/tcp, 5672/tcp | |||||||||||
docker_flower_1 flower --broker=amqp://gue ... Up 0.0.0.0:5555->5555/tcp | |||||||||||
docker_kafka_1 start-kafka.sh Up 0.0.0.0:5092->5092/tcp | |||||||||||
docker_swh-deposit-db_1 docker-entrypoint.sh postgres Up 5432/tcp | |||||||||||
docker_swh-deposit_1 /entrypoint.sh Up 0.0.0.0:5006->5006/tcp | |||||||||||
[...] | |||||||||||
The startup of some containers may fail the first time for | |||||||||||
dependency-related problems. If some containers failed to start, just | |||||||||||
run the ``docker-compose up -d`` command again. | |||||||||||
If a container really refuses to start properly, you can check why using | |||||||||||
the ``docker-compose logs`` command. For example:: | |||||||||||
~/swh-environment/docker$ docker-compose logs swh-lister | |||||||||||
Attaching to docker_swh-lister_1 | |||||||||||
[...] | |||||||||||
swh-lister_1 | Processing /src/swh-scheduler | |||||||||||
swh-lister_1 | Could not install packages due to an EnvironmentError: [('/src/swh-scheduler/.hypothesis/unicodedata/8.0.0/charmap.json.gz', '/tmp/pip-req-build-pm7nsax3/.hypothesis/unicodedata/8.0.0/charmap.json.gz', "[Errno 13] Permission denied: '/src/swh-scheduler/.hypothesis/unicodedata/8.0.0/charmap.json.gz'")] | |||||||||||
swh-lister_1 | | |||||||||||
Once all containers are running, you can use the web interface by | |||||||||||
opening http://localhost:5080/ in your web browser. | |||||||||||
At this point, the archive is empty and needs to be filled with some | |||||||||||
content. To do so, you can create tasks that will scrape a forge. For | |||||||||||
example, to inject the code from the https://0xacab.org gitlab forge:: | |||||||||||
~/swh-environment/docker$ docker-compose exec swh-scheduler \ | |||||||||||
swh scheduler task add list-gitlab-full \ | |||||||||||
-p oneshot url=https://0xacab.org/api/v4 | |||||||||||
Created 1 tasks | |||||||||||
Task 1 | |||||||||||
Next run: just now (2018-12-19 14:58:49+00:00) | |||||||||||
Interval: 90 days, 0:00:00 | |||||||||||
Type: list-gitlab-full | |||||||||||
Policy: oneshot | |||||||||||
Args: | |||||||||||
Keyword args: | |||||||||||
url=https://0xacab.org/api/v4 | |||||||||||
This task will scrape the forge’s project list and register origins to the scheduler. | |||||||||||
This takes at most a couple of minutes. | |||||||||||
Then, you must tell the scheduler to create loading tasks for these origins. | |||||||||||
For example, to create tasks for 100 of these origins:: | |||||||||||
~/swh-environment/docker$ docker-compose exec swh-scheduler \ | |||||||||||
swh scheduler origin schedule-next git 100 | |||||||||||
This will take a bit of time to complete. | |||||||||||
To increase the speed at which git repositories are imported, you can | |||||||||||
spawn more ``swh-loader-git`` workers:: | |||||||||||
~/swh-environment/docker$ docker-compose exec swh-scheduler \ | |||||||||||
celery status | |||||||||||
listers@50ac2185c6c9: OK | |||||||||||
loader@b164f9055637: OK | |||||||||||
indexer@33bc6067a5b8: OK | |||||||||||
vault@c9fef1bbfdc1: OK | |||||||||||
4 nodes online. | |||||||||||
~/swh-environment/docker$ docker-compose exec swh-scheduler \ | |||||||||||
celery control pool_grow 3 -d loader@b164f9055637 | |||||||||||
-> loader@b164f9055637: OK | |||||||||||
pool will grow | |||||||||||
~/swh-environment/docker$ docker-compose exec swh-scheduler \ | |||||||||||
celery inspect -d loader@b164f9055637 stats | grep prefetch_count | |||||||||||
"prefetch_count": 4 | |||||||||||
Now there are 4 workers ingesting git repositories. You can also | |||||||||||
increase the number of ``swh-loader-git`` containers:: | |||||||||||
~/swh-environment/docker$ docker-compose up -d --scale swh-loader=4 | |||||||||||
[...] | |||||||||||
Creating docker_swh-loader_2 ... done | |||||||||||
Creating docker_swh-loader_3 ... done | |||||||||||
Creating docker_swh-loader_4 ... done | |||||||||||
Updating the docker image | |||||||||||
------------------------- | |||||||||||
All containers started by ``docker-compose`` are bound to a docker image | |||||||||||
named ``swh/stack`` including all the software components of Software | |||||||||||
Heritage. When new versions of these components are released, the docker | |||||||||||
image will not be automatically updated. In order to update all Software | |||||||||||
Heritage components to their latest version, the docker image needs to | |||||||||||
be explicitly rebuilt by issuing the following command from within the | |||||||||||
``docker`` directory:: | |||||||||||
~/swh-environment/docker$ docker build --no-cache -t swh/stack . | .. _Compose: https://docs.docker.com/compose/ | ||||||||||
Details | Details | ||||||||||
------- | ------- | ||||||||||
This runs the following services on their respectively standard ports, | This runs the following services on their respectively standard ports, | ||||||||||
all of the following services are configured to communicate with each | all of the following services are configured to communicate with each | ||||||||||
other: | other: | ||||||||||
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines | may type:: | ||||||||||
~/swh-environment/docker$ celery --broker amqp://:5072// \ | ~/swh-environment/docker$ celery --broker amqp://:5072// \ | ||||||||||
--app swh.scheduler.celery_backend.config.app status | --app swh.scheduler.celery_backend.config.app status | ||||||||||
loader@61704103668c: OK | loader@61704103668c: OK | ||||||||||
[...] | [...] | ||||||||||
To run the same command from within a container:: | To run the same command from within a container:: | ||||||||||
~/swh-environment/docker$ docker-compose exec swh-scheduler celery status | ~/swh-environment/docker$ docker compose exec swh-scheduler celery status | ||||||||||
loader@61704103668c: OK | loader@61704103668c: OK | ||||||||||
[...] | [...] | ||||||||||
To consume ``kafka`` topics from the host, for example to run the `swh | To consume ``kafka`` topics from the host, for example to run the `swh | ||||||||||
dataset graph export` command, a configuration file could be:: | dataset graph export` command, a configuration file could be:: | ||||||||||
~/swh-environment/docker$ cat dataset_config.yml | ~/swh-environment/docker$ cat dataset_config.yml | ||||||||||
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines | |||||||||||
source packages, etc.) | source packages, etc.) | ||||||||||
Then, for each repository, a new task will be created to ingest this | Then, for each repository, a new task will be created to ingest this | ||||||||||
repository and keep it up to date. | repository and keep it up to date. | ||||||||||
For example, to add a (one shot) task that will list git repos on the | For example, to add a (one shot) task that will list git repos on the | ||||||||||
0xacab.org gitlab instance, one can do (from this git repository):: | 0xacab.org gitlab instance, one can do (from this git repository):: | ||||||||||
~/swh-environment/docker$ docker-compose exec swh-scheduler \ | ~/swh-environment/docker$ docker compose exec swh-scheduler \ | ||||||||||
swh scheduler task add list-gitlab-full \ | swh scheduler task add list-gitlab-full \ | ||||||||||
-p oneshot url=https://0xacab.org/api/v4 | -p oneshot url=https://0xacab.org/api/v4 | ||||||||||
Created 1 tasks | Created 1 tasks | ||||||||||
Task 12 | Task 12 | ||||||||||
Next run: just now (2018-12-19 14:58:49+00:00) | Next run: just now (2018-12-19 14:58:49+00:00) | ||||||||||
Interval: 90 days, 0:00:00 | Interval: 90 days, 0:00:00 | ||||||||||
Type: list-gitlab-full | Type: list-gitlab-full | ||||||||||
Policy: oneshot | Policy: oneshot | ||||||||||
Args: | Args: | ||||||||||
Keyword args: | Keyword args: | ||||||||||
url=https://0xacab.org/api/v4 | url=https://0xacab.org/api/v4 | ||||||||||
This will insert a new task in the scheduler. To list existing tasks for | This will insert a new task in the scheduler. To list existing tasks for | ||||||||||
a given task type:: | a given task type:: | ||||||||||
~/swh-environment/docker$ docker-compose exec swh-scheduler \ | ~/swh-environment/docker$ docker compose exec swh-scheduler \ | ||||||||||
swh scheduler task list-pending list-gitlab-full | swh scheduler task list-pending list-gitlab-full | ||||||||||
Found 1 list-gitlab-full tasks | Found 1 list-gitlab-full tasks | ||||||||||
Task 12 | Task 12 | ||||||||||
Next run: 2 minutes ago (2018-12-19 14:58:49+00:00) | Next run: 2 minutes ago (2018-12-19 14:58:49+00:00) | ||||||||||
Interval: 90 days, 0:00:00 | Interval: 90 days, 0:00:00 | ||||||||||
Type: list-gitlab-full | Type: list-gitlab-full | ||||||||||
Policy: oneshot | Policy: oneshot | ||||||||||
Args: | Args: | ||||||||||
Keyword args: | Keyword args: | ||||||||||
url=https://0xacab.org/api/v4 | url=https://0xacab.org/api/v4 | ||||||||||
To list all existing task types:: | To list all existing task types:: | ||||||||||
~/swh-environment/docker$ docker-compose exec swh-scheduler \ | ~/swh-environment/docker$ docker compose exec swh-scheduler \ | ||||||||||
swh scheduler task-type list | swh scheduler task-type list | ||||||||||
Known task types: | Known task types: | ||||||||||
load-svn-from-archive: | load-svn-from-archive: | ||||||||||
Loading svn repositories from svn dump | Loading svn repositories from svn dump | ||||||||||
load-svn: | load-svn: | ||||||||||
Create dump of a remote svn repository, mount it and load it | Create dump of a remote svn repository, mount it and load it | ||||||||||
load-deposit: | load-deposit: | ||||||||||
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines | |||||||||||
console on ``http://localhost:5080/rabbitmq`` or the grafana dashboard | console on ``http://localhost:5080/rabbitmq`` or the grafana dashboard | ||||||||||
on ``http://localhost:5080/grafana``. | on ``http://localhost:5080/grafana``. | ||||||||||
If you cannot see any task being executed, check the logs of the | If you cannot see any task being executed, check the logs of the | ||||||||||
``swh-scheduler-runner`` service (here is a failure example due to the | ``swh-scheduler-runner`` service (here is a failure example due to the | ||||||||||
debian lister task not being properly registered on the | debian lister task not being properly registered on the | ||||||||||
swh-scheduler-runner service):: | swh-scheduler-runner service):: | ||||||||||
~/swh-environment/docker$ docker-compose logs --tail=10 swh-scheduler-runner | ~/swh-environment/docker$ docker compose logs --tail=10 swh-scheduler-runner | ||||||||||
ardumont: ? | |||||||||||
Attaching to docker_swh-scheduler-runner_1 | Attaching to docker_swh-scheduler-runner_1 | ||||||||||
swh-scheduler-runner_1 | "__main__", mod_spec) | swh-scheduler-runner_1 | "__main__", mod_spec) | ||||||||||
swh-scheduler-runner_1 | File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code | swh-scheduler-runner_1 | File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code | ||||||||||
swh-scheduler-runner_1 | exec(code, run_globals) | swh-scheduler-runner_1 | exec(code, run_globals) | ||||||||||
swh-scheduler-runner_1 | File "/usr/local/lib/python3.7/site-packages/swh/scheduler/celery_backend/runner.py", line 107, in <module> | swh-scheduler-runner_1 | File "/usr/local/lib/python3.7/site-packages/swh/scheduler/celery_backend/runner.py", line 107, in <module> | ||||||||||
swh-scheduler-runner_1 | run_ready_tasks(main_backend, main_app) | swh-scheduler-runner_1 | run_ready_tasks(main_backend, main_app) | ||||||||||
swh-scheduler-runner_1 | File "/usr/local/lib/python3.7/site-packages/swh/scheduler/celery_backend/runner.py", line 81, in run_ready_tasks | swh-scheduler-runner_1 | File "/usr/local/lib/python3.7/site-packages/swh/scheduler/celery_backend/runner.py", line 81, in run_ready_tasks | ||||||||||
swh-scheduler-runner_1 | task_types[task['type']]['backend_name'] | swh-scheduler-runner_1 | task_types[task['type']]['backend_name'] | ||||||||||
Show All 27 Lines | From there, we will checkout or update all the swh packages:: | ||||||||||
~/swh-environment$ ./bin/update | ~/swh-environment$ ./bin/update | ||||||||||
Install a swh package from sources in a container | Install a swh package from sources in a container | ||||||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||||||||||
It is possible to run a docker container with some swh packages | It is possible to run a docker container with some swh packages | ||||||||||
installed from sources instead of using the latest published packages | installed from sources instead of using the latest published packages | ||||||||||
from pypi. To do this you must write a docker-compose override file | from pypi. To do this you must write a | ||||||||||
`Docker Compose override file <https://docs.docker.com/compose/extends>`_ | |||||||||||
(``docker-compose.override.yml``). An example is given in the | (``docker-compose.override.yml``). An example is given in the | ||||||||||
``docker-compose.override.yml.example`` file: | ``docker-compose.override.yml.example`` file: | ||||||||||
.. code:: yaml | .. code:: yaml | ||||||||||
version: '2' | version: '2' | ||||||||||
services: | services: | ||||||||||
swh-objstorage: | swh-objstorage: | ||||||||||
volumes: | volumes: | ||||||||||
- "$HOME/swh-environment/swh-objstorage:/src/swh-objstorage" | - "$HOME/swh-environment/swh-objstorage:/src/swh-objstorage" | ||||||||||
The file named ``docker-compose.override.yml`` will automatically be | The file named ``docker-compose.override.yml`` will automatically be | ||||||||||
loaded by ``docker-compose``. | loaded by Docker Compose. | ||||||||||
This example shows the simplest case of the ``swh-objstorage`` package: | This example shows the simplest case of the ``swh-objstorage`` package: | ||||||||||
you just have to mount it in the container in ``/src`` and the | you just have to mount it in the container in ``/src`` and the | ||||||||||
entrypoint will ensure every swh-\* package found in ``/src/`` is | entrypoint will ensure every swh-\* package found in ``/src/`` is | ||||||||||
installed (using ``pip install -e`` so you can easily hack your code). | installed (using ``pip install -e`` so you can easily hack your code). | ||||||||||
If the application you play with has autoreload support, there is no | If the application you play with has autoreload support, there is no | ||||||||||
need to restart the impacted container.) | need to restart the impacted container.) | ||||||||||
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines | case "$shell" in | ||||||||||
;; | ;; | ||||||||||
esac | esac | ||||||||||
eval "$(_SWH_COMPLETE=$autocomplete_cmd swh)" | eval "$(_SWH_COMPLETE=$autocomplete_cmd swh)" | ||||||||||
export SWH_SCHEDULER_URL=http://127.0.0.1:5008/ | export SWH_SCHEDULER_URL=http://127.0.0.1:5008/ | ||||||||||
export BROKER_URL=amqp://127.0.0.1:5072/ | export BROKER_URL=amqp://127.0.0.1:5072/ | ||||||||||
export APP=swh.scheduler.celery_backend.config.app | export APP=swh.scheduler.celery_backend.config.app | ||||||||||
export COMPOSE_FILE=~/swh-environment/docker/docker-compose.yml:~/swh-environment/docker/docker-compose.override.yml | export COMPOSE_FILE=~/swh-environment/docker/docker-compose.yml:~/swh-environment/docker/docker-compose.override.yml | ||||||||||
alias doco=docker-compose | alias doco="docker compose" | ||||||||||
Done Inline Actions
ardumont: | |||||||||||
EOF | EOF | ||||||||||
This postactivate script does: | This postactivate script does: | ||||||||||
- install a shell completion handler for the swh-scheduler command, | - install a shell completion handler for the swh-scheduler command, | ||||||||||
- preset a bunch of environment variables | - preset a bunch of environment variables | ||||||||||
- ``SWH_SCHEDULER_URL`` so that you can just run ``swh scheduler`` against | - ``SWH_SCHEDULER_URL`` so that you can just run ``swh scheduler`` against | ||||||||||
the scheduler API instance running in docker, without having to specify | the scheduler API instance running in docker, without having to specify | ||||||||||
the endpoint URL, | the endpoint URL, | ||||||||||
- ``BROKER_URL`` and ``APP`` so you can execute the ``celery`` tool (without | - ``BROKER_URL`` and ``APP`` so you can execute the ``celery`` tool (without | ||||||||||
cli options) against the rabbitmq server running in the docker environment | cli options) against the rabbitmq server running in the docker environment | ||||||||||
(see the `documentation of the celery command | (see the `documentation of the celery command | ||||||||||
<https://docs.celeryproject.org/en/latest/reference/cli.html>`_), | <https://docs.celeryproject.org/en/latest/reference/cli.html>`_), | ||||||||||
- ``COMPOSE_FILE`` so you can run ``docker-compose`` from everywhere, | - ``COMPOSE_FILE`` so you can run ``docker compose`` from everywhere, | ||||||||||
- create an alias ``doco`` for ``docker-compose`` because this is way | - create an alias ``doco`` for ``docker compose`` because this is way | ||||||||||
too long to type, | too long to type, | ||||||||||
So now you can easily: | So now you can easily: | ||||||||||
- Start the SWH platform:: | - Start the SWH platform:: | ||||||||||
(swh) ~/swh-environment$ doco up -d | (swh) ~/swh-environment$ doco up -d | ||||||||||
[...] | [...] | ||||||||||
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines | |||||||||||
Data persistence for a development setting | Data persistence for a development setting | ||||||||||
------------------------------------------ | ------------------------------------------ | ||||||||||
The default ``docker-compose.yml`` configuration is not geared towards | The default ``docker-compose.yml`` configuration is not geared towards | ||||||||||
data persistence, but application testing. | data persistence, but application testing. | ||||||||||
Volumes defined in associated images are anonymous and may get either | Volumes defined in associated images are anonymous and may get either | ||||||||||
unused or removed on the next ``docker-compose up``. | unused or removed on the next ``docker compose up``. | ||||||||||
One way to make sure these volumes persist is to use named volumes. The | One way to make sure these volumes persist is to use named volumes. The | ||||||||||
volumes may be defined as follows in a ``docker-compose.override.yml``. | volumes may be defined as follows in a ``docker-compose.override.yml``. | ||||||||||
Note that volume definitions are merged with other compose files based | Note that volume definitions are merged with other compose files based | ||||||||||
on destination path. | on destination path. | ||||||||||
:: | :: | ||||||||||
services: | services: | ||||||||||
swh-storage-db: | swh-storage-db: | ||||||||||
volumes: | volumes: | ||||||||||
- "swh_storage_data:/var/lib/postgresql/data" | - "swh_storage_data:/var/lib/postgresql/data" | ||||||||||
swh-objstorage: | swh-objstorage: | ||||||||||
volumes: | volumes: | ||||||||||
- "swh_objstorage_data:/srv/softwareheritage/objects" | - "swh_objstorage_data:/srv/softwareheritage/objects" | ||||||||||
volumes: | volumes: | ||||||||||
swh_storage_data: | swh_storage_data: | ||||||||||
swh_objstorage_data: | swh_objstorage_data: | ||||||||||
This way, ``docker-compose down`` without the ``-v`` flag will not | This way, ``docker compose down`` without the ``-v`` flag will not | ||||||||||
remove those volumes and data will persist. | remove those volumes and data will persist. | ||||||||||
Additional components | Additional components | ||||||||||
--------------------- | --------------------- | ||||||||||
We provide some extra modularity in what components to run through | We provide some extra modularity in what components to run through | ||||||||||
additional ``docker-compose.*.yml`` files. | additional ``docker-compose.*.yml`` files. | ||||||||||
They are disabled by default, because they add layers of complexity | They are disabled by default, because they add layers of complexity | ||||||||||
and increase resource usage, while not being necessary to operate | and increase resource usage, while not being necessary to operate | ||||||||||
a small Software Heritage instance. | a small Software Heritage instance. | ||||||||||
Starting a kafka-powered mirror of the storage | Starting a kafka-powered mirror of the storage | ||||||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||||||||||
This repo comes with an optional ``docker-compose.storage-mirror.yml`` | This repo comes with an optional ``docker-compose.storage-mirror.yml`` | ||||||||||
docker compose file that can be used to test the kafka-powered mirror | docker compose file that can be used to test the kafka-powered mirror | ||||||||||
mechanism for the main storage. | mechanism for the main storage. | ||||||||||
This can be used like:: | This can be used like:: | ||||||||||
~/swh-environment/docker$ docker-compose \ | ~/swh-environment/docker$ docker compose \ | ||||||||||
-f docker-compose.yml \ | -f docker-compose.yml \ | ||||||||||
-f docker-compose.storage-mirror.yml \ | -f docker-compose.storage-mirror.yml \ | ||||||||||
up -d | up -d | ||||||||||
[...] | [...] | ||||||||||
Compared to the original compose file, this will: | Compared to the original compose file, this will: | ||||||||||
- overrides the swh-storage service to activate the kafka direct writer | - overrides the swh-storage service to activate the kafka direct writer | ||||||||||
Show All 17 Lines | |||||||||||
Starting the backfiller | Starting the backfiller | ||||||||||
""""""""""""""""""""""" | """"""""""""""""""""""" | ||||||||||
Reading from the storage the objects from within range [start-object, | Reading from the storage the objects from within range [start-object, | ||||||||||
end-object] to the kafka topics. | end-object] to the kafka topics. | ||||||||||
:: | :: | ||||||||||
~/swh-environment/docker$ docker-compose \ | ~/swh-environment/docker$ docker compose \ | ||||||||||
-f docker-compose.yml \ | -f docker-compose.yml \ | ||||||||||
-f docker-compose.storage-mirror.yml \ | -f docker-compose.storage-mirror.yml \ | ||||||||||
-f docker-compose.storage-mirror.override.yml \ | -f docker-compose.storage-mirror.override.yml \ | ||||||||||
run \ | run \ | ||||||||||
swh-journal-backfiller \ | swh-journal-backfiller \ | ||||||||||
snapshot \ | snapshot \ | ||||||||||
--start-object 000000 \ | --start-object 000000 \ | ||||||||||
--end-object 000001 \ | --end-object 000001 \ | ||||||||||
--dry-run | --dry-run | ||||||||||
Cassandra | Cassandra | ||||||||||
^^^^^^^^^ | ^^^^^^^^^ | ||||||||||
We are working on an alternative backend for swh-storage, based on Cassandra | We are working on an alternative backend for swh-storage, based on Cassandra | ||||||||||
instead of PostgreSQL. | instead of PostgreSQL. | ||||||||||
This can be used like:: | This can be used like:: | ||||||||||
~/swh-environment/docker$ docker-compose \ | ~/swh-environment/docker$ docker compose \ | ||||||||||
-f docker-compose.yml \ | -f docker-compose.yml \ | ||||||||||
-f docker-compose.cassandra.yml \ | -f docker-compose.cassandra.yml \ | ||||||||||
up -d | up -d | ||||||||||
[...] | [...] | ||||||||||
This launches two Cassandra servers, and reconfigures swh-storage to use them. | This launches two Cassandra servers, and reconfigures swh-storage to use them. | ||||||||||
Efficient origin search | Efficient origin search | ||||||||||
^^^^^^^^^^^^^^^^^^^^^^^ | ^^^^^^^^^^^^^^^^^^^^^^^ | ||||||||||
By default, swh-web uses swh-storage and swh-indexer-storage to provide its | By default, swh-web uses swh-storage and swh-indexer-storage to provide its | ||||||||||
search bar. They are both based on PostgreSQL and rather inefficient | search bar. They are both based on PostgreSQL and rather inefficient | ||||||||||
(or Cassandra, which is even slower). | (or Cassandra, which is even slower). | ||||||||||
Instead, you can enable swh-search, which is based on ElasticSearch | Instead, you can enable swh-search, which is based on ElasticSearch | ||||||||||
and much more efficient, like this:: | and much more efficient, like this:: | ||||||||||
~/swh-environment/docker$ docker-compose \ | ~/swh-environment/docker$ docker compose \ | ||||||||||
-f docker-compose.yml \ | -f docker-compose.yml \ | ||||||||||
-f docker-compose.search.yml \ | -f docker-compose.search.yml \ | ||||||||||
up -d | up -d | ||||||||||
[...] | [...] | ||||||||||
Efficient counters | Efficient counters | ||||||||||
^^^^^^^^^^^^^^^^^^ | ^^^^^^^^^^^^^^^^^^ | ||||||||||
The web interface shows counters of the number of objects in your archive, | The web interface shows counters of the number of objects in your archive, | ||||||||||
by counting objects in the PostgreSQL or Cassandra database. | by counting objects in the PostgreSQL or Cassandra database. | ||||||||||
While this should not be an issue at the scale of your local Docker instance, | While this should not be an issue at the scale of your local Docker instance, | ||||||||||
counting objects can actually be a bottleneck at Software Heritage's scale. | counting objects can actually be a bottleneck at Software Heritage's scale. | ||||||||||
So swh-storage uses heuristics, that can be either not very efficient | So swh-storage uses heuristics, that can be either not very efficient | ||||||||||
or inaccurate. | or inaccurate. | ||||||||||
So we have an alternative based on Redis' HyperLogLog feature, which you | So we have an alternative based on Redis' HyperLogLog feature, which you | ||||||||||
can test with:: | can test with:: | ||||||||||
~/swh-environment/docker$ docker-compose \ | ~/swh-environment/docker$ docker compose \ | ||||||||||
-f docker-compose.yml \ | -f docker-compose.yml \ | ||||||||||
-f docker-compose.counters.yml \ | -f docker-compose.counters.yml \ | ||||||||||
up -d | up -d | ||||||||||
[...] | [...] | ||||||||||
Efficient graph traversals | Efficient graph traversals | ||||||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^ | ^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||||||||||
:ref:`swh-graph <swh-graph>` is a work-in-progress alternative to swh-storage | :ref:`swh-graph <swh-graph>` is a work-in-progress alternative to swh-storage | ||||||||||
to perform large graph traversals/queries on the merkle DAG. | to perform large graph traversals/queries on the merkle DAG. | ||||||||||
For example, it can be used by the vault, as it needs to query all objects | For example, it can be used by the vault, as it needs to query all objects | ||||||||||
in the sub-DAG of a given node. | in the sub-DAG of a given node. | ||||||||||
You can use it with:: | You can use it with:: | ||||||||||
~/swh-environment/docker$ docker-compose \ | ~/swh-environment/docker$ docker compose \ | ||||||||||
-f docker-compose.yml \ | -f docker-compose.yml \ | ||||||||||
-f docker-compose.graph.yml up -d | -f docker-compose.graph.yml up -d | ||||||||||
On the first start, it will run some precomputation based on all objects already | On the first start, it will run some precomputation based on all objects already | ||||||||||
in your local SWH instance; so it may take a long time if you loaded many | in your local SWH instance; so it may take a long time if you loaded many | ||||||||||
repositories. (Expect 5 to 10s per repository.) | repositories. (Expect 5 to 10s per repository.) | ||||||||||
It **does not update automatically** when you load new repositories. | It **does not update automatically** when you load new repositories. | ||||||||||
You need to restart it every time you want to update it. | You need to restart it every time you want to update it. | ||||||||||
You can :ref:`mount a docker volume <docker-persistence>` on | You can :ref:`mount a docker volume <docker-persistence>` on | ||||||||||
:file:`/srv/softwareheritage/graph` to avoid recomputing this graph | :file:`/srv/softwareheritage/graph` to avoid recomputing this graph | ||||||||||
on every start. | on every start. | ||||||||||
Then, you need to explicitly request recomputing the graph before restarts | Then, you need to explicitly request recomputing the graph before restarts | ||||||||||
if you want to update it:: | if you want to update it:: | ||||||||||
~/swh-environment/docker$ docker-compose \ | ~/swh-environment/docker$ docker compose \ | ||||||||||
-f docker-compose.yml \ | -f docker-compose.yml \ | ||||||||||
-f docker-compose.graph.yml \ | -f docker-compose.graph.yml \ | ||||||||||
run swh-graph update | run swh-graph update | ||||||||||
~/swh-environment/docker$ docker-compose \ | ~/swh-environment/docker$ docker compose \ | ||||||||||
-f docker-compose.yml \ | -f docker-compose.yml \ | ||||||||||
-f docker-compose.graph.yml \ | -f docker-compose.graph.yml \ | ||||||||||
stop swh-graph | stop swh-graph | ||||||||||
~/swh-environment/docker$ docker-compose \ | ~/swh-environment/docker$ docker compose \ | ||||||||||
-f docker-compose.yml \ | -f docker-compose.yml \ | ||||||||||
-f docker-compose.graph.yml \ | -f docker-compose.graph.yml \ | ||||||||||
up -d swh-graph | up -d swh-graph | ||||||||||
Keycloak | Keycloak | ||||||||||
^^^^^^^^ | ^^^^^^^^ | ||||||||||
If you really want to hack on swh-web's authentication features, | If you really want to hack on swh-web's authentication features, | ||||||||||
you will need to enable Keycloak as well, instead of the default | you will need to enable Keycloak as well, instead of the default | ||||||||||
Django-based authentication:: | Django-based authentication:: | ||||||||||
~/swh-environment/docker$ docker-compose -f docker-compose.yml -f docker-compose.keycloak.yml up -d | ~/swh-environment/docker$ docker compose -f docker-compose.yml -f docker-compose.keycloak.yml up -d | ||||||||||
[...] | [...] | ||||||||||
User registration in Keycloak database is available by following the Register link | User registration in Keycloak database is available by following the Register link | ||||||||||
in the page located at http://localhost:5080/oidc/login/. | in the page located at http://localhost:5080/oidc/login/. | ||||||||||
Please note that email verification is required to properly register an account. | Please note that email verification is required to properly register an account. | ||||||||||
As we are in a testing environment, we use a MailHog instance as a fake SMTP server. | As we are in a testing environment, we use a MailHog instance as a fake SMTP server. | ||||||||||
All emails sent by Keycloak can be easily read from the MailHog Web UI located | All emails sent by Keycloak can be easily read from the MailHog Web UI located | ||||||||||
at http://localhost:8025/. | at http://localhost:8025/. | ||||||||||
Kafka | Kafka | ||||||||||
^^^^^ | ^^^^^ | ||||||||||
Consuming topics from the host | Consuming topics from the host | ||||||||||
"""""""""""""""""""""""""""""" | """""""""""""""""""""""""""""" | ||||||||||
As mentioned above, it is possible to consume topics from the kafka server available | As mentioned above, it is possible to consume topics from the kafka server available | ||||||||||
in the docker-compose environment from the host using `127.0.0.1:5092` as broker URL. | in the Docker Compose environment from the host using `127.0.0.1:5092` as broker URL. | ||||||||||
Resetting offsets | Resetting offsets | ||||||||||
""""""""""""""""" | """"""""""""""""" | ||||||||||
It is also possible to reset a consumer group offset using the following command:: | It is also possible to reset a consumer group offset using the following command:: | ||||||||||
~swh-environment/docker$ docker-compose \ | ~swh-environment/docker$ docker compose \ | ||||||||||
run kafka kafka-consumer-groups.sh \ | run kafka kafka-consumer-groups.sh \ | ||||||||||
--bootstrap-server kafka:9092 \ | --bootstrap-server kafka:9092 \ | ||||||||||
--group <group> \ | --group <group> \ | ||||||||||
--all-topics \ | --all-topics \ | ||||||||||
--reset-offsets --to-earliest --execute | --reset-offsets --to-earliest --execute | ||||||||||
[...] | [...] | ||||||||||
You can use `--topic <topic>` instead of `--all-topics` to specify a topic. | You can use `--topic <topic>` instead of `--all-topics` to specify a topic. | ||||||||||
Getting information on consumers | Getting information on consumers | ||||||||||
"""""""""""""""""""""""""""""""" | """""""""""""""""""""""""""""""" | ||||||||||
You can get information on consumer groups:: | You can get information on consumer groups:: | ||||||||||
~swh-environment/docker$ docker-compose \ | ~swh-environment/docker$ docker compose \ | ||||||||||
run kafka kafka-consumer-groups.sh \ | run kafka kafka-consumer-groups.sh \ | ||||||||||
--bootstrap-server kafka:9092 \ | --bootstrap-server kafka:9092 \ | ||||||||||
--describe --members --all-groups | --describe --members --all-groups | ||||||||||
[...] | [...] | ||||||||||
Or the stored offsets for all (or a given) groups:: | Or the stored offsets for all (or a given) groups:: | ||||||||||
~swh-environment/docker$ docker-compose \ | ~swh-environment/docker$ docker compose \ | ||||||||||
run kafka kafka-consumer-groups.sh \ | run kafka kafka-consumer-groups.sh \ | ||||||||||
--bootstrap-server kafka:9092 \ | --bootstrap-server kafka:9092 \ | ||||||||||
--describe --offsets --all-groups | --describe --offsets --all-groups | ||||||||||
[...] | [...] | ||||||||||
Using Sentry | Using Sentry | ||||||||||
------------ | ------------ | ||||||||||
Show All 16 Lines | |||||||||||
Also, a few containers (``swh-storage``, ``swh-xxx-db``) use a volume | Also, a few containers (``swh-storage``, ``swh-xxx-db``) use a volume | ||||||||||
for storing the blobs or the database files. With the default | for storing the blobs or the database files. With the default | ||||||||||
configuration provided in the ``docker-compose.yml`` file, these volumes | configuration provided in the ``docker-compose.yml`` file, these volumes | ||||||||||
are not persistent. So removing the containers will delete the volumes! | are not persistent. So removing the containers will delete the volumes! | ||||||||||
Also note that for the ``swh-objstorage``, since the volume can be | Also note that for the ``swh-objstorage``, since the volume can be | ||||||||||
pretty big, the remove operation can be quite long (several minutes is | pretty big, the remove operation can be quite long (several minutes is | ||||||||||
not uncommon), which may mess a bit with the ``docker-compose`` command. | not uncommon), which may mess a bit with the ``docker compose`` command. | ||||||||||
If you have an error message like: | If you have an error message like: | ||||||||||
Error response from daemon: removal of container 928de3110381 is already | Error response from daemon: removal of container 928de3110381 is already | ||||||||||
in progress | in progress | ||||||||||
it means that you need to wait for this process to finish before being | it means that you need to wait for this process to finish before being | ||||||||||
able to (re)start your docker stack again. | able to (re)start your docker stack again. |
?