Changeset View
Changeset View
Standalone View
Standalone View
docker/README.md
Show First 20 Lines • Show All 80 Lines • ▼ Show 20 Lines | |||||
Once all containers are running, you can use the web interface by opening | Once all containers are running, you can use the web interface by opening | ||||
http://localhost:5080/ in your web browser. | http://localhost:5080/ in your web browser. | ||||
At this point, the archive is empty and needs to be filled with some content. | At this point, the archive is empty and needs to be filled with some content. | ||||
To do so, you can create tasks that will scrape a forge. For example, to inject | To do so, you can create tasks that will scrape a forge. For example, to inject | ||||
the code from the https://0xacab.org gitlab forge: | the code from the https://0xacab.org gitlab forge: | ||||
``` | ``` | ||||
~/swh-environment/docker$ docker-compose exec swh-scheduler-api \ | ~/swh-environment/docker$ docker-compose exec swh-scheduler \ | ||||
swh scheduler task add list-gitlab-full \ | swh scheduler task add list-gitlab-full \ | ||||
-p oneshot url=https://0xacab.org/api/v4 | -p oneshot url=https://0xacab.org/api/v4 | ||||
Created 1 tasks | Created 1 tasks | ||||
Task 1 | Task 1 | ||||
Next run: just now (2018-12-19 14:58:49+00:00) | Next run: just now (2018-12-19 14:58:49+00:00) | ||||
Interval: 90 days, 0:00:00 | Interval: 90 days, 0:00:00 | ||||
Type: list-gitlab-full | Type: list-gitlab-full | ||||
Policy: oneshot | Policy: oneshot | ||||
Args: | Args: | ||||
Keyword args: | Keyword args: | ||||
url=https://0xacab.org/api/v4 | url=https://0xacab.org/api/v4 | ||||
``` | ``` | ||||
This task will scrape the forge's project list and create subtasks to inject | This task will scrape the forge's project list and create subtasks to inject | ||||
each git repository found there. | each git repository found there. | ||||
This will take a bit af time to complete. | This will take a bit af time to complete. | ||||
To increase the speed at which git repositories are imported, you can spawn more | To increase the speed at which git repositories are imported, you can spawn more | ||||
`swh-loader-git` workers: | `swh-loader-git` workers: | ||||
``` | ``` | ||||
~/swh-environment/docker$ docker-compose exec swh-scheduler-api \ | ~/swh-environment/docker$ docker-compose exec swh-scheduler \ | ||||
celery status | celery status | ||||
listers@50ac2185c6c9: OK | listers@50ac2185c6c9: OK | ||||
loader@b164f9055637: OK | loader@b164f9055637: OK | ||||
indexer@33bc6067a5b8: OK | indexer@33bc6067a5b8: OK | ||||
vault@c9fef1bbfdc1: OK | vault@c9fef1bbfdc1: OK | ||||
4 nodes online. | 4 nodes online. | ||||
~/swh-environment/docker$ docker-compose exec swh-scheduler-api \ | ~/swh-environment/docker$ docker-compose exec swh-scheduler \ | ||||
celery control pool_grow 3 -d loader@b164f9055637 | celery control pool_grow 3 -d loader@b164f9055637 | ||||
-> loader@b164f9055637: OK | -> loader@b164f9055637: OK | ||||
pool will grow | pool will grow | ||||
~/swh-environment/docker$ docker-compose exec swh-scheduler-api \ | ~/swh-environment/docker$ docker-compose exec swh-scheduler \ | ||||
celery inspect -d loader@b164f9055637 stats | grep prefetch_count | celery inspect -d loader@b164f9055637 stats | grep prefetch_count | ||||
"prefetch_count": 4 | "prefetch_count": 4 | ||||
``` | ``` | ||||
Now there are 4 workers ingesting git repositories. | Now there are 4 workers ingesting git repositories. | ||||
You can also increase the number of `swh-loader-git` containers: | You can also increase the number of `swh-loader-git` containers: | ||||
``` | ``` | ||||
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines | |||||
~/swh-environment/docker$ CELERY_BROKER_URL=amqp://:5072// celery status | ~/swh-environment/docker$ CELERY_BROKER_URL=amqp://:5072// celery status | ||||
loader@61704103668c: OK | loader@61704103668c: OK | ||||
[...] | [...] | ||||
``` | ``` | ||||
To run the same command from within a container: | To run the same command from within a container: | ||||
``` | ``` | ||||
~/swh-environment/docker$ docker-compose exec swh-scheduler-api celery status | ~/swh-environment/docker$ docker-compose exec swh-scheduler celery status | ||||
loader@61704103668c: OK | loader@61704103668c: OK | ||||
[...] | [...] | ||||
``` | ``` | ||||
## Managing tasks | ## Managing tasks | ||||
One of the main components of the Software Heritage platform is the task system. | One of the main components of the Software Heritage platform is the task system. | ||||
Show All 29 Lines | |||||
Then, for each repository, a new task will be created to ingest this repository | Then, for each repository, a new task will be created to ingest this repository | ||||
and keep it up to date. | and keep it up to date. | ||||
For example, to add a (one shot) task that will list git repos on the | For example, to add a (one shot) task that will list git repos on the | ||||
0xacab.org gitlab instance, one can do (from this git repository): | 0xacab.org gitlab instance, one can do (from this git repository): | ||||
``` | ``` | ||||
~/swh-environment/docker$ docker-compose exec swh-scheduler-api \ | ~/swh-environment/docker$ docker-compose exec swh-scheduler \ | ||||
swh scheduler task add list-gitlab-full \ | swh scheduler task add list-gitlab-full \ | ||||
-p oneshot url=https://0xacab.org/api/v4 | -p oneshot url=https://0xacab.org/api/v4 | ||||
Created 1 tasks | Created 1 tasks | ||||
Task 12 | Task 12 | ||||
Next run: just now (2018-12-19 14:58:49+00:00) | Next run: just now (2018-12-19 14:58:49+00:00) | ||||
Interval: 90 days, 0:00:00 | Interval: 90 days, 0:00:00 | ||||
Type: list-gitlab-full | Type: list-gitlab-full | ||||
Policy: oneshot | Policy: oneshot | ||||
Args: | Args: | ||||
Keyword args: | Keyword args: | ||||
url=https://0xacab.org/api/v4 | url=https://0xacab.org/api/v4 | ||||
``` | ``` | ||||
This will insert a new task in the scheduler. To list existing tasks for a | This will insert a new task in the scheduler. To list existing tasks for a | ||||
given task type: | given task type: | ||||
``` | ``` | ||||
~/swh-environment/docker$ docker-compose exec swh-scheduler-api \ | ~/swh-environment/docker$ docker-compose exec swh-scheduler \ | ||||
swh scheduler task list-pending list-gitlab-full | swh scheduler task list-pending list-gitlab-full | ||||
Found 1 list-gitlab-full tasks | Found 1 list-gitlab-full tasks | ||||
Task 12 | Task 12 | ||||
Next run: 2 minutes ago (2018-12-19 14:58:49+00:00) | Next run: 2 minutes ago (2018-12-19 14:58:49+00:00) | ||||
Interval: 90 days, 0:00:00 | Interval: 90 days, 0:00:00 | ||||
Type: list-gitlab-full | Type: list-gitlab-full | ||||
Policy: oneshot | Policy: oneshot | ||||
Args: | Args: | ||||
Keyword args: | Keyword args: | ||||
url=https://0xacab.org/api/v4 | url=https://0xacab.org/api/v4 | ||||
``` | ``` | ||||
To list all existing task types: | To list all existing task types: | ||||
``` | ``` | ||||
~/swh-environment/docker$ docker-compose exec swh-scheduler-api \ | ~/swh-environment/docker$ docker-compose exec swh-scheduler \ | ||||
swh scheduler task-type list | swh scheduler task-type list | ||||
Known task types: | Known task types: | ||||
load-svn-from-archive: | load-svn-from-archive: | ||||
Loading svn repositories from svn dump | Loading svn repositories from svn dump | ||||
load-svn: | load-svn: | ||||
Create dump of a remote svn repository, mount it and load it | Create dump of a remote svn repository, mount it and load it | ||||
load-deposit: | load-deposit: | ||||
▲ Show 20 Lines • Show All 354 Lines • Show Last 20 Lines |