Changeset View
Changeset View
Standalone View
Standalone View
docker/README.rst
- This file was added.
Docker environment | |||||
================== | |||||
``swh-environment/docker/`` contains Dockerfiles to run a small Software Heritage | |||||
instance on development machines. The end goal is to smooth the | |||||
contributors/developers workflow. Focus on coding, not configuring! | |||||
.. warning:: | |||||
Running a Software Heritage instance on your machine can | |||||
consume quite a bit of resources: if you play a bit too hard (e.g., if | |||||
you try to list all GitHub repositories with the corresponding lister), | |||||
you may fill your hard drive, and consume a lot of CPU, memory and | |||||
network bandwidth. | |||||
Dependencies | |||||
------------ | |||||
This uses docker with docker-compose, so ensure you have a working | |||||
docker environment and docker-compose is installed. | |||||
We recommend using the latest version of docker, so please read | |||||
https://docs.docker.com/install/linux/docker-ce/debian/ for more details | |||||
on how to install docker on your machine. | |||||
On a debian system, docker-compose can be installed from Debian | |||||
repositories:: | |||||
~$ sudo apt install docker-compose | |||||
Quick start | |||||
----------- | |||||
First, change to the docker dir if you aren’t there yet:: | |||||
~$ cd swh-environment/docker | |||||
Then, start containers:: | |||||
~/swh-environment/docker$ docker-compose up -d | |||||
[...] | |||||
Creating docker_amqp_1 ... done | |||||
Creating docker_zookeeper_1 ... done | |||||
Creating docker_kafka_1 ... done | |||||
Creating docker_flower_1 ... done | |||||
Creating docker_swh-scheduler-db_1 ... done | |||||
[...] | |||||
This will build docker images and run them. Check everything is running | |||||
fine with:: | |||||
~/swh-environment/docker$ docker-compose ps | |||||
Name Command State Ports | |||||
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | |||||
docker_amqp_1 docker-entrypoint.sh rabbi ... Up 15671/tcp, 0.0.0.0:5018->15672/tcp, 25672/tcp, 4369/tcp, 5671/tcp, 5672/tcp | |||||
docker_flower_1 flower --broker=amqp://gue ... Up 0.0.0.0:5555->5555/tcp | |||||
docker_kafka_1 start-kafka.sh Up 0.0.0.0:9092->9092/tcp | |||||
docker_swh-deposit-db_1 docker-entrypoint.sh postgres Up 5432/tcp | |||||
docker_swh-deposit_1 /entrypoint.sh Up 0.0.0.0:5006->5006/tcp | |||||
[...] | |||||
The startup of some containers may fail the first time for | |||||
dependency-related problems. If some containers failed to start, just | |||||
run the ``docker-compose up -d`` command again. | |||||
If a container really refuses to start properly, you can check why using | |||||
the ``docker-compose logs`` command. For example:: | |||||
~/swh-environment/docker$ docker-compose logs swh-lister | |||||
Attaching to docker_swh-lister_1 | |||||
[...] | |||||
swh-lister_1 | Processing /src/swh-scheduler | |||||
swh-lister_1 | Could not install packages due to an EnvironmentError: [('/src/swh-scheduler/.hypothesis/unicodedata/8.0.0/charmap.json.gz', '/tmp/pip-req-build-pm7nsax3/.hypothesis/unicodedata/8.0.0/charmap.json.gz', "[Errno 13] Permission denied: '/src/swh-scheduler/.hypothesis/unicodedata/8.0.0/charmap.json.gz'")] | |||||
swh-lister_1 | | |||||
Once all containers are running, you can use the web interface by | |||||
opening http://localhost:5080/ in your web browser. | |||||
At this point, the archive is empty and needs to be filled with some | |||||
content. To do so, you can create tasks that will scrape a forge. For | |||||
example, to inject the code from the https://0xacab.org gitlab forge:: | |||||
~/swh-environment/docker$ docker-compose exec swh-scheduler \ | |||||
swh scheduler task add list-gitlab-full \ | |||||
-p oneshot url=https://0xacab.org/api/v4 | |||||
Created 1 tasks | |||||
Task 1 | |||||
Next run: just now (2018-12-19 14:58:49+00:00) | |||||
Interval: 90 days, 0:00:00 | |||||
Type: list-gitlab-full | |||||
Policy: oneshot | |||||
Args: | |||||
Keyword args: | |||||
url=https://0xacab.org/api/v4 | |||||
This task will scrape the forge’s project list and create subtasks to | |||||
inject each git repository found there. | |||||
This will take a bit af time to complete. | |||||
To increase the speed at which git repositories are imported, you can | |||||
spawn more ``swh-loader-git`` workers:: | |||||
~/swh-environment/docker$ docker-compose exec swh-scheduler \ | |||||
celery status | |||||
listers@50ac2185c6c9: OK | |||||
loader@b164f9055637: OK | |||||
indexer@33bc6067a5b8: OK | |||||
vault@c9fef1bbfdc1: OK | |||||
4 nodes online. | |||||
~/swh-environment/docker$ docker-compose exec swh-scheduler \ | |||||
celery control pool_grow 3 -d loader@b164f9055637 | |||||
-> loader@b164f9055637: OK | |||||
pool will grow | |||||
~/swh-environment/docker$ docker-compose exec swh-scheduler \ | |||||
celery inspect -d loader@b164f9055637 stats | grep prefetch_count | |||||
"prefetch_count": 4 | |||||
Now there are 4 workers ingesting git repositories. You can also | |||||
increase the number of ``swh-loader-git`` containers:: | |||||
~/swh-environment/docker$ docker-compose up -d --scale swh-loader=4 | |||||
[...] | |||||
Creating docker_swh-loader_2 ... done | |||||
Creating docker_swh-loader_3 ... done | |||||
Creating docker_swh-loader_4 ... done | |||||
Updating the docker image | |||||
------------------------- | |||||
All containers started by ``docker-compose`` are bound to a docker image | |||||
named ``swh/stack`` including all the software components of Software | |||||
Heritage. When new versions of these components are released, the docker | |||||
image will not be automatically updated. In order to update all Software | |||||
Heritage components to their latest version, the docker image needs to | |||||
be explicitly rebuilt by issuing the following command from within the | |||||
``docker`` directory:: | |||||
~/swh-environment/docker$ docker build --no-cache -t swh/stack . | |||||
Details | |||||
------- | |||||
This runs the following services on their respectively standard ports, | |||||
all of the following services are configured to communicate with each | |||||
other: | |||||
- swh-storage-db: a ``softwareheritage`` instance db that stores the | |||||
Merkle DAG, | |||||
- swh-objstorage: Content-addressable object storage, | |||||
- swh-storage: Abstraction layer over the archive, allowing to access | |||||
all stored source code artifacts as well as their metadata, | |||||
- swh-web: the Software Heritage web user interface, | |||||
- swh-scheduler: the API service as well as 2 utilities, the runner and | |||||
the listener, | |||||
- swh-lister: celery workers dedicated to running lister tasks, | |||||
- swh-loaders: celery workers dedicated to importing/updating source | |||||
code content (VCS repos, source packages, etc.), | |||||
- swh-journal: Persistent logger of changes to the archive, with | |||||
publish-subscribe support. | |||||
That means you can start doing the ingestion using those services using | |||||
the same setup described in the getting-started starting directly at | |||||
https://docs.softwareheritage.org/devel/getting-started.html#step-4-ingest-repositories | |||||
Exposed Ports | |||||
~~~~~~~~~~~~~ | |||||
Several services have their listening ports exposed on the host: | |||||
- amqp: 5072 | |||||
- kafka: 5092 | |||||
- nginx: 5080 | |||||
And for SWH services: | |||||
- scheduler API: 5008 | |||||
- storage API: 5002 | |||||
- object storage API: 5003 | |||||
- indexer API: 5007 | |||||
- web app: 5004 | |||||
- deposit app: 5006 | |||||
Beware that these ports are not the same as the ports used from within | |||||
the docker network. This means that the same command executed from the | |||||
host or from a docker container will not use the same urls to access | |||||
services. For example, to use the ``celery`` utility from the host, you | |||||
may type:: | |||||
~/swh-environment/docker$ CELERY_BROKER_URL=amqp://:5072// celery status | |||||
loader@61704103668c: OK | |||||
[...] | |||||
To run the same command from within a container:: | |||||
~/swh-environment/docker$ docker-compose exec swh-scheduler celery status | |||||
loader@61704103668c: OK | |||||
[...] | |||||
Managing tasks | |||||
-------------- | |||||
One of the main components of the Software Heritage platform is the task | |||||
system. These are used to manage everything related to background | |||||
process, like discovering new git repositories to import, ingesting | |||||
them, checking a known repository is up to date, etc. | |||||
The task system is based on Celery but uses a custom database-based | |||||
scheduler. | |||||
So when we refer to the term ‘task’, it may designate either a Celery | |||||
task or a SWH one (ie. the entity in the database). When we refer to | |||||
simply a “task” in the documentation, it designates the SWH task. | |||||
When a SWH task is ready to be executed, a Celery task is created to | |||||
handle the actual SWH task’s job. Note that not all Celery tasks are | |||||
directly linked to a SWH task (some SWH tasks are implemented using a | |||||
Celery task that spawns Celery subtasks). | |||||
A (SWH) task can be ``recurring`` or ``oneshot``. ``oneshot`` tasks are | |||||
only executed once, whereas ``recurring`` are regularly executed. The | |||||
scheduling configuration of these recurring tasks can be set via the | |||||
fields ``current_interval`` and ``priority`` (can be ‘high’, ‘normal’ or | |||||
‘low’) of the task database entity. | |||||
Inserting a new lister task | |||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |||||
To list the content of a source code provider like github or a Debian | |||||
distribution, you may add a new task for this. | |||||
This task will (generally) scrape a web page or use a public API to | |||||
identify the list of published software artefacts (git repos, debian | |||||
source packages, etc.) | |||||
Then, for each repository, a new task will be created to ingest this | |||||
repository and keep it up to date. | |||||
For example, to add a (one shot) task that will list git repos on the | |||||
0xacab.org gitlab instance, one can do (from this git repository):: | |||||
~/swh-environment/docker$ docker-compose exec swh-scheduler \ | |||||
swh scheduler task add list-gitlab-full \ | |||||
-p oneshot url=https://0xacab.org/api/v4 | |||||
Created 1 tasks | |||||
Task 12 | |||||
Next run: just now (2018-12-19 14:58:49+00:00) | |||||
Interval: 90 days, 0:00:00 | |||||
Type: list-gitlab-full | |||||
Policy: oneshot | |||||
Args: | |||||
Keyword args: | |||||
url=https://0xacab.org/api/v4 | |||||
This will insert a new task in the scheduler. To list existing tasks for | |||||
a given task type:: | |||||
~/swh-environment/docker$ docker-compose exec swh-scheduler \ | |||||
swh scheduler task list-pending list-gitlab-full | |||||
Found 1 list-gitlab-full tasks | |||||
Task 12 | |||||
Next run: 2 minutes ago (2018-12-19 14:58:49+00:00) | |||||
Interval: 90 days, 0:00:00 | |||||
Type: list-gitlab-full | |||||
Policy: oneshot | |||||
Args: | |||||
Keyword args: | |||||
url=https://0xacab.org/api/v4 | |||||
To list all existing task types:: | |||||
~/swh-environment/docker$ docker-compose exec swh-scheduler \ | |||||
swh scheduler task-type list | |||||
Known task types: | |||||
load-svn-from-archive: | |||||
Loading svn repositories from svn dump | |||||
load-svn: | |||||
Create dump of a remote svn repository, mount it and load it | |||||
load-deposit: | |||||
Loading deposit archive into swh through swh-loader-tar | |||||
check-deposit: | |||||
Pre-checking deposit step before loading into swh archive | |||||
cook-vault-bundle: | |||||
Cook a Vault bundle | |||||
load-hg: | |||||
Loading mercurial repository swh-loader-mercurial | |||||
load-hg-from-archive: | |||||
Loading archive mercurial repository swh-loader-mercurial | |||||
load-git: | |||||
Update an origin of type git | |||||
list-github-incremental: | |||||
Incrementally list GitHub | |||||
list-github-full: | |||||
Full update of GitHub repos list | |||||
list-debian-distribution: | |||||
List a Debian distribution | |||||
list-gitlab-incremental: | |||||
Incrementally list a Gitlab instance | |||||
list-gitlab-full: | |||||
Full update of a Gitlab instance's repos list | |||||
list-pypi: | |||||
Full pypi lister | |||||
load-pypi: | |||||
Load Pypi origin | |||||
index-mimetype: | |||||
Mimetype indexer task | |||||
index-mimetype-for-range: | |||||
Mimetype Range indexer task | |||||
index-fossology-license: | |||||
Fossology license indexer task | |||||
index-fossology-license-for-range: | |||||
Fossology license range indexer task | |||||
index-origin-head: | |||||
Origin Head indexer task | |||||
index-revision-metadata: | |||||
Revision Metadata indexer task | |||||
index-origin-metadata: | |||||
Origin Metadata indexer task | |||||
Monitoring activity | |||||
~~~~~~~~~~~~~~~~~~~ | |||||
You can monitor the workers activity by connecting to the RabbitMQ | |||||
console on ``http://localhost:5080/rabbitmq`` or the grafana dashboard | |||||
on ``http://localhost:5080/grafana``. | |||||
If you cannot see any task being executed, check the logs of the | |||||
``swh-scheduler-runner`` service (here is a failure example due to the | |||||
debian lister task not being properly registered on the | |||||
swh-scheduler-runner service):: | |||||
~/swh-environment/docker$ docker-compose logs --tail=10 swh-scheduler-runner | |||||
Attaching to docker_swh-scheduler-runner_1 | |||||
swh-scheduler-runner_1 | "__main__", mod_spec) | |||||
swh-scheduler-runner_1 | File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code | |||||
swh-scheduler-runner_1 | exec(code, run_globals) | |||||
swh-scheduler-runner_1 | File "/usr/local/lib/python3.7/site-packages/swh/scheduler/celery_backend/runner.py", line 107, in <module> | |||||
swh-scheduler-runner_1 | run_ready_tasks(main_backend, main_app) | |||||
swh-scheduler-runner_1 | File "/usr/local/lib/python3.7/site-packages/swh/scheduler/celery_backend/runner.py", line 81, in run_ready_tasks | |||||
swh-scheduler-runner_1 | task_types[task['type']]['backend_name'] | |||||
swh-scheduler-runner_1 | File "/usr/local/lib/python3.7/site-packages/celery/app/registry.py", line 21, in __missing__ | |||||
swh-scheduler-runner_1 | raise self.NotRegistered(key) | |||||
swh-scheduler-runner_1 | celery.exceptions.NotRegistered: 'swh.lister.debian.tasks.DebianListerTask' | |||||
Using docker setup development and integration testing | |||||
------------------------------------------------------ | |||||
If you hack the code of one or more archive components with a virtual | |||||
env based setup as described in the | |||||
[[https://docs.softwareheritage.org/devel/developer-setup.html|developer | |||||
setup guide]], you may want to test your modifications in a working | |||||
Software Heritage instance. The simplest way to achieve this is to use | |||||
this docker-based environment. | |||||
If you haven’t followed the | |||||
[[https://docs.softwareheritage.org/devel/developer-setup.html|developer | |||||
setup guide]], you must clone the the [swh-environment] repo in your | |||||
``swh-environment`` directory:: | |||||
~/swh-environment$ git clone https://forge.softwareheritage.org/source/swh-environment.git . | |||||
Note the ``.`` at the end of this command: we want the git repository to | |||||
be cloned directly in the ``~/swh-environment`` directory, not in a sub | |||||
directory. Also note that if you haven’t done it yet and you want to | |||||
hack the source code of one or more Software Heritage packages, you | |||||
really should read the | |||||
[[https://docs.softwareheritage.org/devel/developer-setup.html|developer | |||||
setup guide]]. | |||||
From there, we will checkout or update all the swh packages:: | |||||
~/swh-environment$ ./bin/update | |||||
Install a swh package from sources in a container | |||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |||||
It is possible to run a docker container with some swh packages | |||||
installed from sources instead of using the latest published packages | |||||
from pypi. To do this you must write a docker-compose override file | |||||
(``docker-compose.override.yml``). An example is given in the | |||||
``docker-compose.override.yml.example`` file: | |||||
.. code:: yaml | |||||
version: '2' | |||||
services: | |||||
swh-objstorage: | |||||
volumes: | |||||
- "$HOME/swh-environment/swh-objstorage:/src/swh-objstorage" | |||||
The file named ``docker-compose.override.yml`` will automatically be | |||||
loaded by ``docker-compose``. | |||||
This example shows the simplest case of the ``swh-objstorage`` package: | |||||
you just have to mount it in the container in ``/src`` and the | |||||
entrypoint will ensure every swh-\* package found in ``/src/`` is | |||||
installed (using ``pip install -e`` so you can easily hack your code). | |||||
If the application you play with has autoreload support, there is no | |||||
need to restart the impacted container.) | |||||
Using locally installed swh tools with docker | |||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |||||
In all examples above, we have executed swh commands from within a | |||||
running container. Now we also have these swh commands locally available | |||||
in our virtual env, we can use them to interact with swh services | |||||
running in docker containers. | |||||
For this, we just need to configure a few environment variables. First, | |||||
ensure your Software Heritage virtualenv is activated (here, using | |||||
virtualenvwrapper):: | |||||
~$ workon swh | |||||
(swh) ~/swh-environment$ export SWH_SCHEDULER_URL=http://127.0.0.1:5008/ | |||||
(swh) ~/swh-environment$ export CELERY_BROKER_URL=amqp://127.0.0.1:5072/ | |||||
Now we can use the ``celery`` command directly to control the celery | |||||
system running in the docker environment:: | |||||
(swh) ~/swh-environment$ celery status | |||||
vault@c9fef1bbfdc1: OK | |||||
listers@ba66f18e7d02: OK | |||||
indexer@cb14c33cbbfb: OK | |||||
loader@61704103668c: OK | |||||
4 nodes online. | |||||
(swh) ~/swh-environment$ celery control -d loader@61704103668c pool_grow 3 | |||||
And we can use the ``swh-scheduler`` command all the same:: | |||||
(swh) ~/swh-environment$ swh scheduler task-type list | |||||
Known task types: | |||||
index-fossology-license: | |||||
Fossology license indexer task | |||||
index-mimetype: | |||||
Mimetype indexer task | |||||
[...] | |||||
Make your life a bit easier | |||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |||||
When you use virtualenvwrapper, you can add postactivation commands:: | |||||
(swh) ~/swh-environment$ cat >>$VIRTUAL_ENV/bin/postactivate <<'EOF' | |||||
# unfortunately, the interface cmd for the click autocompletion | |||||
# depends on the shell | |||||
# https://click.palletsprojects.com/en/7.x/bashcomplete/#activation | |||||
shell=$(basename $SHELL) | |||||
case "$shell" in | |||||
"zsh") | |||||
autocomplete_cmd=source_zsh | |||||
;; | |||||
*) | |||||
autocomplete_cmd=source | |||||
;; | |||||
esac | |||||
eval "$(_SWH_COMPLETE=$autocomplete_cmd swh)" | |||||
export SWH_SCHEDULER_URL=http://127.0.0.1:5008/ | |||||
export CELERY_BROKER_URL=amqp://127.0.0.1:5072/ | |||||
export COMPOSE_FILE=~/swh-environment/docker/docker-compose.yml:~/swh-environment/docker/docker-compose.override.yml | |||||
alias doco=docker-compose | |||||
EOF | |||||
This postactivate script does: | |||||
- install a shell completion handler for the swh-scheduler command, | |||||
- preset a bunch of environment variables | |||||
- ``SWH_SCHEDULER_URL`` so that you can just run ``swh scheduler`` | |||||
against the scheduler API instance running in docker, without | |||||
having to specify the endpoint URL, | |||||
- ``CELERY_BROKER`` so you can execute the ``celery`` tool (without | |||||
cli options) against the rabbitmq server running in the docker | |||||
environment, | |||||
- ``COMPOSE_FILE`` so you can run ``docker-compose`` from | |||||
everywhere, | |||||
- create an alias ``doco`` for ``docker-compose`` because this is way | |||||
too long to type, | |||||
So now you can easily: | |||||
- Start the SWH platform:: | |||||
(swh) ~/swh-environment$ doco up -d | |||||
[...] | |||||
- Check celery:: | |||||
(swh) ~/swh-environment$ celery status | |||||
listers@50ac2185c6c9: OK | |||||
loader@b164f9055637: OK | |||||
indexer@33bc6067a5b8: OK | |||||
- List task-types:: | |||||
(swh) ~/swh-environment$ swh scheduler task-type list | |||||
[...] | |||||
- Get more info on a task type:: | |||||
(swh) ~/swh-environment$ swh scheduler task-type list -v -t load-hg | |||||
Known task types: | |||||
load-hg: swh.loader.mercurial.tasks.LoadMercurial | |||||
Loading mercurial repository swh-loader-mercurial | |||||
interval: 1 day, 0:00:00 [1 day, 0:00:00, 1 day, 0:00:00] | |||||
backoff_factor: 1.0 | |||||
max_queue_length: 1000 | |||||
num_retries: None | |||||
retry_delay: None | |||||
- Add a new task:: | |||||
(swh) ~/swh-environment$ swh scheduler task add load-hg \ | |||||
origin_url=https://hg.logilab.org/master/cubicweb | |||||
Created 1 tasks | |||||
Task 1 | |||||
Next run: just now (2019-02-06 12:36:58+00:00) | |||||
Interval: 1 day, 0:00:00 | |||||
Type: load-hg | |||||
Policy: recurring | |||||
Args: | |||||
Keyword args: | |||||
origin_url: https://hg.logilab.org/master/cubicweb | |||||
- Respawn a task:: | |||||
(swh) ~/swh-environment$ swh scheduler task respawn 1 | |||||
Data persistence for a development setting | |||||
------------------------------------------ | |||||
The default ``docker-compose.yml`` configuration is not geared towards | |||||
data persistence, but application testing. | |||||
Volumes defined in associated images are anonymous and may get either | |||||
unused or removed on the next ``docker-compose up``. | |||||
One way to make sure these volumes persist is to use named volumes. The | |||||
volumes may be defined as follows in a ``docker-compose.override.yml``. | |||||
Note that volume definitions are merged with other compose files based | |||||
on destination path. | |||||
:: | |||||
services: | |||||
swh-storage-db: | |||||
volumes: | |||||
- "swh_storage_data:/var/lib/postgresql/data" | |||||
swh-objstorage: | |||||
volumes: | |||||
- "swh_objstorage_data:/srv/softwareheritage/objects" | |||||
volumes: | |||||
swh_storage_data: | |||||
swh_objstorage_data: | |||||
This way, ``docker-compose down`` without the ``-v`` flag will not | |||||
remove those volumes and data will persist. | |||||
Starting a kafka-powered mirror of the storage | |||||
---------------------------------------------- | |||||
This repo comes with an optional ``docker-compose.storage-mirror.yml`` | |||||
docker compose file that can be used to test the kafka-powered mirror | |||||
mecanism for the main storage. | |||||
This can be used like:: | |||||
~/swh-environment/docker$ docker-compose -f docker-compose.yml -f docker-compose.storage-mirror.yml up -d | |||||
[...] | |||||
Compared to the original compose file, this will: | |||||
- overrides the swh-storage service to activate the kafka direct writer | |||||
on swh.journal.objects prefixed topics using thw swh.storage.master | |||||
ID, | |||||
- overrides the swh-web service to make it use the mirror instead of | |||||
the master storage, | |||||
- starts a db for the mirror, | |||||
- starts a storage service based on this db, | |||||
- starts a replayer service that runs the process that listen to kafka | |||||
to keeps the mirror in sync. | |||||
When using it, you will have a setup in which the master storage is used | |||||
by workers and most other services, whereas the storage mirror will be | |||||
used to by the web application and should be kept in sync with the | |||||
master storage by kafka. | |||||
Note that the object storage is not replicated here, only the graph | |||||
storage. | |||||
Starting the backfiller | |||||
----------------------- | |||||
Reading from the storage the objects from within range [start-object, | |||||
end-object] to the kafka topics. | |||||
:: | |||||
(swh)$ docker-compose \ | |||||
-f docker-compose.yml \ | |||||
-f docker-compose.storage-mirror.yml \ | |||||
-f docker-compose.storage-mirror.override.yml \ | |||||
run \ | |||||
swh-journal-backfiller \ | |||||
snapshot \ | |||||
--start-object 000000 \ | |||||
--end-object 000001 \ | |||||
--dry-run | |||||
Using Sentry | |||||
------------ | |||||
All entrypoints to SWH code (CLI, gunicorn, celery, …) are, or should | |||||
be, intrumented using Sentry. By default this is disabled, but if you | |||||
run your own Sentry instance, you can use it. | |||||
To do so, you must get a DSN from your Sentry instance, and set it as | |||||
the value of ``SWH_SENTRY_DSN`` in the file ``env/common_python.env``. | |||||
You may also set it per-service in the ``environment`` section of each | |||||
services in ``docker-compose.override.yml``. | |||||
Caveats | |||||
------- | |||||
Running a lister task can lead to a lot of loading tasks, which can fill | |||||
your hard drive pretty fast. Make sure to monitor your available storage | |||||
space regularly when playing with this stack. | |||||
Also, a few containers (``swh-storage``, ``swh-xxx-db``) use a volume | |||||
for storing the blobs or the database files. With the default | |||||
configuration provided in the ``docker-compose.yml`` file, these volumes | |||||
are not persistant. So removing the containers will delete the volumes! | |||||
Also note that for the ``swh-objstorage``, since the volume can be | |||||
pretty big, the remove operation can be quite long (several minutes is | |||||
not uncommon), which may mess a bit with the ``docker-compose`` command. | |||||
If you have an error message like: | |||||
Error response from daemon: removal of container 928de3110381 is already | |||||
in progress | |||||
it means that you need to wait for this process to finish before being | |||||
able to (re)start your docker stack again. |