Changeset View
Changeset View
Standalone View
Standalone View
docker/README.md
# swh-docker-dev | # Docker environment | ||||
This repo contains Dockerfiles to allow developers to run a small | This directory contains Dockerfiles to run a small Software Heritage instance | ||||
Software Heritage instance on their development computer. | on development machines. The end goal is to smooth the contributors/developers | ||||
workflow. Focus on coding, not configuring! | |||||
The end goal is to smooth the contributors/developers workflow. Focus | |||||
on coding, not configuring! | |||||
WARNING: Running a Software Heritage instance on your machine can consume | WARNING: Running a Software Heritage instance on your machine can consume | ||||
quite a bit of resources: if you play a bit too hard (e.g., if you | quite a bit of resources: if you play a bit too hard (e.g., if you | ||||
try to list all GitHub repositories with the corresponding lister), | try to list all GitHub repositories with the corresponding lister), | ||||
you may fill your hard drive, and consume a lot of CPU, memory and | you may fill your hard drive, and consume a lot of CPU, memory and | ||||
network bandwidth. | network bandwidth. | ||||
## Dependencies | ## Dependencies | ||||
This uses docker with docker-compose, so ensure you have a working | This uses docker with docker-compose, so ensure you have a working | ||||
docker environment and docker-compose is installed. | docker environment and docker-compose is installed. | ||||
We recommend using the latest version of docker, so please read | We recommend using the latest version of docker, so please read | ||||
https://docs.docker.com/install/linux/docker-ce/debian/ for more details on how | https://docs.docker.com/install/linux/docker-ce/debian/ for more details on how | ||||
to install docker on your machine. | to install docker on your machine. | ||||
On a debian system, docker-compose can be installed from debian repositories. | On a debian system, docker-compose can be installed from Debian repositories: | ||||
On a stable (stretch) machine, it is recommended to install the version from | |||||
[backports](https://backports.debian.org/Instructions/): | |||||
``` | ``` | ||||
~$ sudo apt install -t stretch-backports docker-compose | ~$ sudo apt install docker-compose | ||||
``` | ``` | ||||
## Quick start | ## Quick start | ||||
First, clone this repository. | First, change to the docker dir if you aren't there yet: | ||||
If you already have followed the | |||||
[[https://docs.softwareheritage.org/devel/developer-setup.html|developer setup guide]], | |||||
then you should already have a copy of the swh-docker-env git repository. Use | |||||
it: | |||||
``` | |||||
~$ cd swh-environment/swh-docker-dev | |||||
``` | |||||
Otherwise, we suggest to create a `swh-environment` | |||||
directory in which this repo will be cloned so you can later on run some | |||||
component in docker containers with overrides code from local repositories (see | |||||
[[<#using-docker-setup-development-and-integration-testing>|below]]): | |||||
``` | ``` | ||||
~$ mkdir swh-environment | ~$ cd swh-environment/docker | ||||
~$ cd swh-environment | |||||
~/swh-environment$ git clone https://forge.softwareheritage.org/source/swh-docker-dev.git | |||||
~/swh-environment$ cd swh-docker-dev | |||||
``` | ``` | ||||
Then, start containers: | Then, start containers: | ||||
``` | ``` | ||||
~/swh-environment/swh-docker-dev$ docker-compose up -d | ~/swh-environment/docker$ docker-compose up -d | ||||
[...] | [...] | ||||
Creating swh-docker-dev_amqp_1 ... done | Creating docker_amqp_1 ... done | ||||
Creating swh-docker-dev_zookeeper_1 ... done | Creating docker_zookeeper_1 ... done | ||||
Creating swh-docker-dev_kafka_1 ... done | Creating docker_kafka_1 ... done | ||||
Creating swh-docker-dev_flower_1 ... done | Creating docker_flower_1 ... done | ||||
Creating swh-docker-dev_swh-scheduler-db_1 ... done | Creating docker_swh-scheduler-db_1 ... done | ||||
[...] | [...] | ||||
``` | ``` | ||||
This will build docker images and run them. | This will build docker images and run them. | ||||
Check everything is running fine with: | Check everything is running fine with: | ||||
``` | ``` | ||||
~/swh-environment/swh-docker-dev$ docker-compose ps | ~/swh-environment/docker$ docker-compose ps | ||||
Name Command State Ports | Name Command State Ports | ||||
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ||||
swh-docker-dev_amqp_1 docker-entrypoint.sh rabbi ... Up 15671/tcp, 0.0.0.0:5018->15672/tcp, 25672/tcp, 4369/tcp, 5671/tcp, 5672/tcp | docker_amqp_1 docker-entrypoint.sh rabbi ... Up 15671/tcp, 0.0.0.0:5018->15672/tcp, 25672/tcp, 4369/tcp, 5671/tcp, 5672/tcp | ||||
swh-docker-dev_flower_1 flower --broker=amqp://gue ... Up 0.0.0.0:5555->5555/tcp | docker_flower_1 flower --broker=amqp://gue ... Up 0.0.0.0:5555->5555/tcp | ||||
swh-docker-dev_kafka_1 start-kafka.sh Up 0.0.0.0:9092->9092/tcp | docker_kafka_1 start-kafka.sh Up 0.0.0.0:9092->9092/tcp | ||||
swh-docker-dev_swh-deposit-db_1 docker-entrypoint.sh postgres Up 5432/tcp | docker_swh-deposit-db_1 docker-entrypoint.sh postgres Up 5432/tcp | ||||
swh-docker-dev_swh-deposit_1 /entrypoint.sh Up 0.0.0.0:5006->5006/tcp | docker_swh-deposit_1 /entrypoint.sh Up 0.0.0.0:5006->5006/tcp | ||||
[...] | [...] | ||||
``` | ``` | ||||
At the time of writing this guide, the startup of some containers may fail the | The startup of some containers may fail the first time for dependency-related | ||||
first time for dependency-related problems. If some containers failed to start, | problems. If some containers failed to start, just run the `docker-compose up | ||||
just run the `docker-compose up -d` command again. | -d` command again. | ||||
If a container really refuses to start properly, you can check why using the | If a container really refuses to start properly, you can check why using the | ||||
`docker-compose logs` command. For example: | `docker-compose logs` command. For example: | ||||
``` | ``` | ||||
~/swh-environment/swh-docker-dev$ docker-compose logs swh-lister | ~/swh-environment/docker$ docker-compose logs swh-lister | ||||
Attaching to swh-docker-dev_swh-lister_1 | Attaching to docker_swh-lister_1 | ||||
[...] | [...] | ||||
swh-lister_1 | Processing /src/swh-scheduler | swh-lister_1 | Processing /src/swh-scheduler | ||||
swh-lister_1 | Could not install packages due to an EnvironmentError: [('/src/swh-scheduler/.hypothesis/unicodedata/8.0.0/charmap.json.gz', '/tmp/pip-req-build-pm7nsax3/.hypothesis/unicodedata/8.0.0/charmap.json.gz', "[Errno 13] Permission denied: '/src/swh-scheduler/.hypothesis/unicodedata/8.0.0/charmap.json.gz'")] | swh-lister_1 | Could not install packages due to an EnvironmentError: [('/src/swh-scheduler/.hypothesis/unicodedata/8.0.0/charmap.json.gz', '/tmp/pip-req-build-pm7nsax3/.hypothesis/unicodedata/8.0.0/charmap.json.gz', "[Errno 13] Permission denied: '/src/swh-scheduler/.hypothesis/unicodedata/8.0.0/charmap.json.gz'")] | ||||
swh-lister_1 | | swh-lister_1 | | ||||
``` | ``` | ||||
Once all containers are running, you can use the web interface by opening | Once all containers are running, you can use the web interface by opening | ||||
http://localhost:5080/ in your web browser. | http://localhost:5080/ in your web browser. | ||||
At this point, the archive is empty and needs to be filled with some content. | At this point, the archive is empty and needs to be filled with some content. | ||||
To do so, you can create tasks that will scrape a forge. For example, to inject | To do so, you can create tasks that will scrape a forge. For example, to inject | ||||
the code from the https://0xacab.org gitlab forge: | the code from the https://0xacab.org gitlab forge: | ||||
``` | ``` | ||||
~/swh-environment/swh-docker-dev$ docker-compose exec swh-scheduler-api \ | ~/swh-environment/docker$ docker-compose exec swh-scheduler-api \ | ||||
swh scheduler task add list-gitlab-full \ | swh scheduler task add list-gitlab-full \ | ||||
-p oneshot url=https://0xacab.org/api/v4 | -p oneshot url=https://0xacab.org/api/v4 | ||||
Created 1 tasks | Created 1 tasks | ||||
Task 1 | Task 1 | ||||
Next run: just now (2018-12-19 14:58:49+00:00) | Next run: just now (2018-12-19 14:58:49+00:00) | ||||
Interval: 90 days, 0:00:00 | Interval: 90 days, 0:00:00 | ||||
Type: list-gitlab-full | Type: list-gitlab-full | ||||
Policy: oneshot | Policy: oneshot | ||||
Args: | Args: | ||||
Keyword args: | Keyword args: | ||||
url=https://0xacab.org/api/v4 | url=https://0xacab.org/api/v4 | ||||
``` | ``` | ||||
This task will scrape the forge's project list and create subtasks to inject | This task will scrape the forge's project list and create subtasks to inject | ||||
each git repository found there. | each git repository found there. | ||||
This will take a bit af time to complete. | This will take a bit af time to complete. | ||||
To increase the speed at which git repositories are imported, you can spawn more | To increase the speed at which git repositories are imported, you can spawn more | ||||
`swh-loader-git` workers: | `swh-loader-git` workers: | ||||
``` | ``` | ||||
~/swh-environment/swh-docker-dev$ docker-compose exec swh-scheduler-api \ | ~/swh-environment/docker$ docker-compose exec swh-scheduler-api \ | ||||
celery status | celery status | ||||
listers@50ac2185c6c9: OK | listers@50ac2185c6c9: OK | ||||
loader@b164f9055637: OK | loader@b164f9055637: OK | ||||
indexer@33bc6067a5b8: OK | indexer@33bc6067a5b8: OK | ||||
vault@c9fef1bbfdc1: OK | vault@c9fef1bbfdc1: OK | ||||
4 nodes online. | 4 nodes online. | ||||
~/swh-environment/swh-docker-dev$ docker-compose exec swh-scheduler-api \ | ~/swh-environment/docker$ docker-compose exec swh-scheduler-api \ | ||||
celery control pool_grow 3 -d loader@b164f9055637 | celery control pool_grow 3 -d loader@b164f9055637 | ||||
-> loader@b164f9055637: OK | -> loader@b164f9055637: OK | ||||
pool will grow | pool will grow | ||||
~/swh-environment/swh-docker-dev$ docker-compose exec swh-scheduler-api \ | ~/swh-environment/docker$ docker-compose exec swh-scheduler-api \ | ||||
celery inspect -d loader@b164f9055637 stats | grep prefetch_count | celery inspect -d loader@b164f9055637 stats | grep prefetch_count | ||||
"prefetch_count": 4 | "prefetch_count": 4 | ||||
``` | ``` | ||||
Now there are 4 workers ingesting git repositories. | Now there are 4 workers ingesting git repositories. | ||||
You can also increase the number of `swh-loader-git` containers: | You can also increase the number of `swh-loader-git` containers: | ||||
``` | ``` | ||||
~/swh-environment/swh-docker-dev$ docker-compose up -d --scale swh-loader=4 | ~/swh-environment/docker$ docker-compose up -d --scale swh-loader=4 | ||||
[...] | [...] | ||||
Creating swh-docker-dev_swh-loader_2 ... done | Creating docker_swh-loader_2 ... done | ||||
Creating swh-docker-dev_swh-loader_3 ... done | Creating docker_swh-loader_3 ... done | ||||
Creating swh-docker-dev_swh-loader_4 ... done | Creating docker_swh-loader_4 ... done | ||||
``` | ``` | ||||
## Updating the docker image | ## Updating the docker image | ||||
All containers started by `docker-compose` are bound to a docker image | All containers started by `docker-compose` are bound to a docker image named | ||||
named `swh/stack` including all the software components of Software Heritage. | `swh/stack` including all the software components of Software Heritage. When | ||||
When new versions of these components are released, the docker image will not | new versions of these components are released, the docker image will not be | ||||
be automatically updated. In order to update all Software heritage components | automatically updated. In order to update all Software Heritage components to | ||||
to their latest version, the docker image needs to be explicitly rebuilt by | their latest version, the docker image needs to be explicitly rebuilt by | ||||
issuing the following command inside the `swh-docker-dev` directory: | issuing the following command from within the `docker` directory: | ||||
``` | ``` | ||||
~/swh-environment/swh-docker-dev$ docker build --no-cache -t swh/stack . | ~/swh-environment/docker$ docker build --no-cache -t swh/stack . | ||||
``` | ``` | ||||
## Details | ## Details | ||||
This runs the following services on their respectively standard ports, | This runs the following services on their respectively standard ports, all of | ||||
all of the following services are configured to communicate with each | the following services are configured to communicate with each other: | ||||
other: | |||||
- swh-storage-db: a `softwareheritage` instance db that stores the | - swh-storage-db: a `softwareheritage` instance db that stores the Merkle DAG, | ||||
Merkle DAG, | |||||
- swh-objstorage: Content-addressable object storage, | - swh-objstorage: Content-addressable object storage, | ||||
- swh-storage: Abstraction layer over the archive, allowing to access | - swh-storage: Abstraction layer over the archive, allowing to access all | ||||
all stored source code artifacts as well as their metadata, | stored source code artifacts as well as their metadata, | ||||
- swh-web: the swh's web interface over the storage, | - swh-web: the Software Heritage web user interface, | ||||
- swh-scheduler: the API service as well as 2 utilities, | - swh-scheduler: the API service as well as 2 utilities, | ||||
the runner and the listener, | the runner and the listener, | ||||
- swh-lister: celery workers dedicated to running lister tasks, | - swh-lister: celery workers dedicated to running lister tasks, | ||||
- swh-loaders: celery workers dedicated to importing/updating source code | - swh-loaders: celery workers dedicated to importing/updating source code | ||||
content (VCS repos, source packages, etc.), | content (VCS repos, source packages, etc.), | ||||
- swh-journal: Persistent logger of changes to the archive, with | - swh-journal: Persistent logger of changes to the archive, with | ||||
publish-subscribe support. | publish-subscribe support. | ||||
That means, you can start doing the ingestion using those services using the | That means you can start doing the ingestion using those services using the | ||||
same setup described in the getting-started starting directly at | same setup described in the getting-started starting directly at | ||||
https://docs.softwareheritage.org/devel/getting-started.html#step-4-ingest-repositories | https://docs.softwareheritage.org/devel/getting-started.html#step-4-ingest-repositories | ||||
### Exposed Ports | ### Exposed Ports | ||||
Several services have their listening ports exposed on the host: | Several services have their listening ports exposed on the host: | ||||
- amqp: 5072 | - amqp: 5072 | ||||
- kafka: 5092 | - kafka: 5092 | ||||
- nginx: 5080 | - nginx: 5080 | ||||
And for SWH services: | And for SWH services: | ||||
- scheduler API: 5008 | - scheduler API: 5008 | ||||
- storage API: 5002 | - storage API: 5002 | ||||
- object storage API: 5003 | - object storage API: 5003 | ||||
- indexer API: 5007 | - indexer API: 5007 | ||||
- web app: 5004 | - web app: 5004 | ||||
- deposit app: 5006 | - deposit app: 5006 | ||||
Beware that these ports are not the same as the ports used from within the | Beware that these ports are not the same as the ports used from within the | ||||
docker network. This means that the same command executed from the host or from | docker network. This means that the same command executed from the host or from | ||||
a docker container will not use the same urls to access services. For example, | a docker container will not use the same urls to access services. For example, | ||||
to use the `celery` utility from the host, you may type: | to use the `celery` utility from the host, you may type: | ||||
``` | ``` | ||||
~/swh-environment/swh-docker-dev$ CELERY_BROKER_URL=amqp://:5072// celery status | ~/swh-environment/docker$ CELERY_BROKER_URL=amqp://:5072// celery status | ||||
loader@61704103668c: OK | loader@61704103668c: OK | ||||
[...] | [...] | ||||
``` | ``` | ||||
To run the same command from within a container: | To run the same command from within a container: | ||||
``` | ``` | ||||
~/swh-environment/swh-docker-dev$ docker-compose exec swh-scheduler-api celery status | ~/swh-environment/docker$ docker-compose exec swh-scheduler-api celery status | ||||
loader@61704103668c: OK | loader@61704103668c: OK | ||||
[...] | [...] | ||||
``` | ``` | ||||
## Managing tasks | ## Managing tasks | ||||
One of the main components of the Software Heritage platform is the task system. | One of the main components of the Software Heritage platform is the task system. | ||||
These are used to manage everything related to background process, like | These are used to manage everything related to background process, like | ||||
discovering new git repositories to import, ingesting them, checking a known | discovering new git repositories to import, ingesting them, checking a known | ||||
repository is up to date, etc. | repository is up to date, etc. | ||||
The task system is based on Celery but uses a custom database-based scheduler. | The task system is based on Celery but uses a custom database-based scheduler. | ||||
Show All 24 Lines | |||||
Then, for each repository, a new task will be created to ingest this repository | Then, for each repository, a new task will be created to ingest this repository | ||||
and keep it up to date. | and keep it up to date. | ||||
For example, to add a (one shot) task that will list git repos on the | For example, to add a (one shot) task that will list git repos on the | ||||
0xacab.org gitlab instance, one can do (from this git repository): | 0xacab.org gitlab instance, one can do (from this git repository): | ||||
``` | ``` | ||||
~/swh-environment/swh-docker-dev$ docker-compose exec swh-scheduler-api \ | ~/swh-environment/docker$ docker-compose exec swh-scheduler-api \ | ||||
swh scheduler task add list-gitlab-full \ | swh scheduler task add list-gitlab-full \ | ||||
-p oneshot url=https://0xacab.org/api/v4 | -p oneshot url=https://0xacab.org/api/v4 | ||||
Created 1 tasks | Created 1 tasks | ||||
Task 12 | Task 12 | ||||
Next run: just now (2018-12-19 14:58:49+00:00) | Next run: just now (2018-12-19 14:58:49+00:00) | ||||
Interval: 90 days, 0:00:00 | Interval: 90 days, 0:00:00 | ||||
Type: list-gitlab-full | Type: list-gitlab-full | ||||
Policy: oneshot | Policy: oneshot | ||||
Args: | Args: | ||||
Keyword args: | Keyword args: | ||||
url=https://0xacab.org/api/v4 | url=https://0xacab.org/api/v4 | ||||
``` | ``` | ||||
This will insert a new task in the scheduler. To list existing tasks for a | This will insert a new task in the scheduler. To list existing tasks for a | ||||
given task type: | given task type: | ||||
``` | ``` | ||||
~/swh-environment/swh-docker-dev$ docker-compose exec swh-scheduler-api \ | ~/swh-environment/docker$ docker-compose exec swh-scheduler-api \ | ||||
swh scheduler task list-pending list-gitlab-full | swh scheduler task list-pending list-gitlab-full | ||||
Found 1 list-gitlab-full tasks | Found 1 list-gitlab-full tasks | ||||
Task 12 | Task 12 | ||||
Next run: 2 minutes ago (2018-12-19 14:58:49+00:00) | Next run: 2 minutes ago (2018-12-19 14:58:49+00:00) | ||||
Interval: 90 days, 0:00:00 | Interval: 90 days, 0:00:00 | ||||
Type: list-gitlab-full | Type: list-gitlab-full | ||||
Policy: oneshot | Policy: oneshot | ||||
Args: | Args: | ||||
Keyword args: | Keyword args: | ||||
url=https://0xacab.org/api/v4 | url=https://0xacab.org/api/v4 | ||||
``` | ``` | ||||
To list all existing task types: | To list all existing task types: | ||||
``` | ``` | ||||
~/swh-environment/swh-docker-dev$ docker-compose exec swh-scheduler-api \ | ~/swh-environment/docker$ docker-compose exec swh-scheduler-api \ | ||||
swh scheduler task-type list | swh scheduler task-type list | ||||
Known task types: | Known task types: | ||||
load-svn-from-archive: | load-svn-from-archive: | ||||
Loading svn repositories from svn dump | Loading svn repositories from svn dump | ||||
load-svn: | load-svn: | ||||
Create dump of a remote svn repository, mount it and load it | Create dump of a remote svn repository, mount it and load it | ||||
load-deposit: | load-deposit: | ||||
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines | |||||
`http://localhost:5080/grafana`. | `http://localhost:5080/grafana`. | ||||
If you cannot see any task being executed, check the logs of the | If you cannot see any task being executed, check the logs of the | ||||
`swh-scheduler-runner` service (here is a failure example due to the | `swh-scheduler-runner` service (here is a failure example due to the | ||||
debian lister task not being properly registered on the | debian lister task not being properly registered on the | ||||
swh-scheduler-runner service): | swh-scheduler-runner service): | ||||
``` | ``` | ||||
~/swh-environment/swh-docker-dev$ docker-compose logs --tail=10 swh-scheduler-runner | ~/swh-environment/docker$ docker-compose logs --tail=10 swh-scheduler-runner | ||||
Attaching to swh-docker-dev_swh-scheduler-runner_1 | Attaching to docker_swh-scheduler-runner_1 | ||||
swh-scheduler-runner_1 | "__main__", mod_spec) | swh-scheduler-runner_1 | "__main__", mod_spec) | ||||
swh-scheduler-runner_1 | File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code | swh-scheduler-runner_1 | File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code | ||||
swh-scheduler-runner_1 | exec(code, run_globals) | swh-scheduler-runner_1 | exec(code, run_globals) | ||||
swh-scheduler-runner_1 | File "/usr/local/lib/python3.7/site-packages/swh/scheduler/celery_backend/runner.py", line 107, in <module> | swh-scheduler-runner_1 | File "/usr/local/lib/python3.7/site-packages/swh/scheduler/celery_backend/runner.py", line 107, in <module> | ||||
swh-scheduler-runner_1 | run_ready_tasks(main_backend, main_app) | swh-scheduler-runner_1 | run_ready_tasks(main_backend, main_app) | ||||
swh-scheduler-runner_1 | File "/usr/local/lib/python3.7/site-packages/swh/scheduler/celery_backend/runner.py", line 81, in run_ready_tasks | swh-scheduler-runner_1 | File "/usr/local/lib/python3.7/site-packages/swh/scheduler/celery_backend/runner.py", line 81, in run_ready_tasks | ||||
swh-scheduler-runner_1 | task_types[task['type']]['backend_name'] | swh-scheduler-runner_1 | task_types[task['type']]['backend_name'] | ||||
swh-scheduler-runner_1 | File "/usr/local/lib/python3.7/site-packages/celery/app/registry.py", line 21, in __missing__ | swh-scheduler-runner_1 | File "/usr/local/lib/python3.7/site-packages/celery/app/registry.py", line 21, in __missing__ | ||||
Show All 27 Lines | |||||
[[https://docs.softwareheritage.org/devel/developer-setup.html|developer setup guide]]. | [[https://docs.softwareheritage.org/devel/developer-setup.html|developer setup guide]]. | ||||
From there, we will checkout or update all the swh packages: | From there, we will checkout or update all the swh packages: | ||||
``` | ``` | ||||
~/swh-environment$ ./bin/update | ~/swh-environment$ ./bin/update | ||||
``` | ``` | ||||
### Install a swh package from sources in a container | ### Install a swh package from sources in a container | ||||
It is possible to run a docker container with some swh packages installed from | It is possible to run a docker container with some swh packages installed from | ||||
sources instead of using the latest published packages from pypi. To do this | sources instead of using the latest published packages from pypi. To do this | ||||
you must write a docker-compose override file (`docker-compose.override.yml`). | you must write a docker-compose override file (`docker-compose.override.yml`). | ||||
An example is given in the `docker-compose.override.yml.example` file: | An example is given in the `docker-compose.override.yml.example` file: | ||||
``` yaml | ``` yaml | ||||
Show All 25 Lines | |||||
docker. | docker. | ||||
``` | ``` | ||||
~/swh-environment$ find . -type d -name __pycache__ -exec rm -rf {} \; | ~/swh-environment$ find . -type d -name __pycache__ -exec rm -rf {} \; | ||||
~/swh-environment$ find . -type d -name .tox -exec rm -rf {} \; | ~/swh-environment$ find . -type d -name .tox -exec rm -rf {} \; | ||||
~/swh-environment$ find . -type d -name .hypothesis -exec rm -rf {} \; | ~/swh-environment$ find . -type d -name .hypothesis -exec rm -rf {} \; | ||||
``` | ``` | ||||
### Using locally installed swh tools with docker | ### Using locally installed swh tools with docker | ||||
In all examples above, we have executed swh commands from within a running | In all examples above, we have executed swh commands from within a running | ||||
container. Now we also have these swh commands locally available in our virtual | container. Now we also have these swh commands locally available in our virtual | ||||
env, we can use them to interact with swh services running in docker | env, we can use them to interact with swh services running in docker | ||||
containers. | containers. | ||||
For this, we just need to configure a few environment variables. First, ensure | For this, we just need to configure a few environment variables. First, ensure | ||||
Show All 26 Lines | |||||
Known task types: | Known task types: | ||||
index-fossology-license: | index-fossology-license: | ||||
Fossology license indexer task | Fossology license indexer task | ||||
index-mimetype: | index-mimetype: | ||||
Mimetype indexer task | Mimetype indexer task | ||||
[...] | [...] | ||||
``` | ``` | ||||
### Make your life a bit easier | ### Make your life a bit easier | ||||
When you use virtualenvwrapper, you can add postactivation commands: | When you use virtualenvwrapper, you can add postactivation commands: | ||||
``` | ``` | ||||
(swh) ~/swh-environment$ cat >>$VIRTUAL_ENV/bin/postactivate <<'EOF' | (swh) ~/swh-environment$ cat >>$VIRTUAL_ENV/bin/postactivate <<'EOF' | ||||
# unfortunately, the interface cmd for the click autocompletion | # unfortunately, the interface cmd for the click autocompletion | ||||
# depends on the shell | # depends on the shell | ||||
# https://click.palletsprojects.com/en/7.x/bashcomplete/#activation | # https://click.palletsprojects.com/en/7.x/bashcomplete/#activation | ||||
shell=$(basename $SHELL) | shell=$(basename $SHELL) | ||||
case "$shell" in | case "$shell" in | ||||
"zsh") | "zsh") | ||||
autocomplete_cmd=source_zsh | autocomplete_cmd=source_zsh | ||||
;; | ;; | ||||
*) | *) | ||||
autocomplete_cmd=source | autocomplete_cmd=source | ||||
;; | ;; | ||||
esac | esac | ||||
eval "$(_SWH_COMPLETE=$autocomplete_cmd swh)" | eval "$(_SWH_COMPLETE=$autocomplete_cmd swh)" | ||||
export SWH_SCHEDULER_URL=http://127.0.0.1:5008/ | export SWH_SCHEDULER_URL=http://127.0.0.1:5008/ | ||||
export CELERY_BROKER_URL=amqp://127.0.0.1:5072/ | export CELERY_BROKER_URL=amqp://127.0.0.1:5072/ | ||||
export COMPOSE_FILE=~/swh-environment/swh-docker-dev/docker-compose.yml:~/swh-environment/swh-docker-dev/docker-compose.override.yml | export COMPOSE_FILE=~/swh-environment/docker/docker-compose.yml:~/swh-environment/docker/docker-compose.override.yml | ||||
alias doco=docker-compose | alias doco=docker-compose | ||||
function swhclean { | function swhclean { | ||||
find ~/swh-environment -type d -name __pycache__ -exec rm -rf {} \; | find ~/swh-environment -type d -name __pycache__ -exec rm -rf {} \; | ||||
find ~/swh-environment -type d -name .tox -exec rm -rf {} \; | find ~/swh-environment -type d -name .tox -exec rm -rf {} \; | ||||
find ~/swh-environment -type d -name .hypothesis -exec rm -rf {} \; | find ~/swh-environment -type d -name .hypothesis -exec rm -rf {} \; | ||||
} | } | ||||
EOF | EOF | ||||
▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines | |||||
This repo comes with an optional `docker-compose.storage-replica.yml` | This repo comes with an optional `docker-compose.storage-replica.yml` | ||||
docker compose file that can be used to test the kafka-powered replication | docker compose file that can be used to test the kafka-powered replication | ||||
mecanism for the main storage. | mecanism for the main storage. | ||||
This can be used like: | This can be used like: | ||||
``` | ``` | ||||
~/swh-environment/swh-docker-dev$ docker-compose -f docker-compose.yml -f docker-compose.storage-replica.yml up -d | ~/swh-environment/docker$ docker-compose -f docker-compose.yml -f docker-compose.storage-replica.yml up -d | ||||
[...] | [...] | ||||
``` | ``` | ||||
Compared to the original compose file, this will: | Compared to the original compose file, this will: | ||||
- overrides the swh-storage service to activate the kafka direct writer | - overrides the swh-storage service to activate the kafka direct writer | ||||
on swh.journal.objects prefixed topics using thw swh.storage.master ID, | on swh.journal.objects prefixed topics using thw swh.storage.master ID, | ||||
- overrides the swh-web service to make it use the replica instead of the | - overrides the swh-web service to make it use the replica instead of the | ||||
master storage, | master storage, | ||||
- starts a db for the replica, | - starts a db for the replica, | ||||
- starts a storage service based on this db, | - starts a storage service based on this db, | ||||
- starts a replayer service that runs the process that listen to kafka to | - starts a replayer service that runs the process that listen to kafka to | ||||
keeps the replica in sync. | keeps the replica in sync. | ||||
When using it, you will have a setup in which the master storage is used by | When using it, you will have a setup in which the master storage is used by | ||||
workers and most other services, whereas the storage replica will be used to | workers and most other services, whereas the storage replica will be used to | ||||
by the web application and should be kept in sync with the master storage | by the web application and should be kept in sync with the master storage | ||||
by kafka. | by kafka. | ||||
Note that the object storage is not replicated here, only the graph storage. | Note that the object storage is not replicated here, only the graph storage. | ||||
## Starting the backfiller | ## Starting the backfiller | ||||
Reading from the storage the objects <object-type> from within range | Reading from the storage the objects <object-type> from within range | ||||
[start-object, end-object] to the kafka topics. | [start-object, end-object] to the kafka topics. | ||||
``` | ``` | ||||
(swh) $ docker-compose \ | (swh)$ docker-compose \ | ||||
-f docker-compose.yml \ | -f docker-compose.yml \ | ||||
-f docker-compose.storage-replica.yml \ | -f docker-compose.storage-replica.yml \ | ||||
-f docker-compose.storage-replica.override.yml \ | -f docker-compose.storage-replica.override.yml \ | ||||
run \ | run \ | ||||
swh-journal-backfiller \ | swh-journal-backfiller \ | ||||
snapshot \ | snapshot \ | ||||
--start-object 000000 \ | --start-object 000000 \ | ||||
--end-object 000001 \ | --end-object 000001 \ | ||||
--dry-run | --dry-run | ||||
``` | ``` |