Changeset View
Changeset View
Standalone View
Standalone View
README.md
Show First 20 Lines • Show All 102 Lines • ▼ Show 20 Lines | |||||
http://localhost:5080/ in your web browser. | http://localhost:5080/ in your web browser. | ||||
At this point, the archive is empty and needs to be filled with some content. | At this point, the archive is empty and needs to be filled with some content. | ||||
To do so, you can create tasks that will scrape a forge. For example, to inject | To do so, you can create tasks that will scrape a forge. For example, to inject | ||||
the code from the https://0xacab.org gitlab forge: | the code from the https://0xacab.org gitlab forge: | ||||
``` | ``` | ||||
~/swh-environment/swh-docker-dev$ docker-compose exec swh-scheduler-api \ | ~/swh-environment/swh-docker-dev$ docker-compose exec swh-scheduler-api \ | ||||
swh-scheduler task add swh-lister-gitlab-full \ | swh scheduler task add swh-lister-gitlab-full \ | ||||
-p oneshot api_baseurl=https://0xacab.org/api/v4 | -p oneshot api_baseurl=https://0xacab.org/api/v4 | ||||
Created 1 tasks | Created 1 tasks | ||||
Task 1 | Task 1 | ||||
Next run: just now (2018-12-19 14:58:49+00:00) | Next run: just now (2018-12-19 14:58:49+00:00) | ||||
Interval: 90 days, 0:00:00 | Interval: 90 days, 0:00:00 | ||||
Type: swh-lister-gitlab-full | Type: swh-lister-gitlab-full | ||||
▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines | |||||
Then, for each repository, a new task will be created to ingest this repository | Then, for each repository, a new task will be created to ingest this repository | ||||
and keep it up to date. | and keep it up to date. | ||||
For example, to add a (one shot) task that will list git repos on the | For example, to add a (one shot) task that will list git repos on the | ||||
0xacab.org gitlab instance, one can do (from this git repository): | 0xacab.org gitlab instance, one can do (from this git repository): | ||||
``` | ``` | ||||
~/swh-environment/swh-docker-dev$ docker-compose exec swh-scheduler-api \ | ~/swh-environment/swh-docker-dev$ docker-compose exec swh-scheduler-api \ | ||||
swh-scheduler task add swh-lister-gitlab-full \ | swh scheduler task add swh-lister-gitlab-full \ | ||||
-p oneshot api_baseurl=https://0xacab.org/api/v4 | -p oneshot api_baseurl=https://0xacab.org/api/v4 | ||||
Created 1 tasks | Created 1 tasks | ||||
Task 12 | Task 12 | ||||
Next run: just now (2018-12-19 14:58:49+00:00) | Next run: just now (2018-12-19 14:58:49+00:00) | ||||
Interval: 90 days, 0:00:00 | Interval: 90 days, 0:00:00 | ||||
Type: swh-lister-gitlab-full | Type: swh-lister-gitlab-full | ||||
Policy: oneshot | Policy: oneshot | ||||
Args: | Args: | ||||
Keyword args: | Keyword args: | ||||
api_baseurl=https://0xacab.org/api/v4 | api_baseurl=https://0xacab.org/api/v4 | ||||
``` | ``` | ||||
This will insert a new task in the scheduler. To list existing tasks for a | This will insert a new task in the scheduler. To list existing tasks for a | ||||
given task type: | given task type: | ||||
``` | ``` | ||||
~/swh-environment/swh-docker-dev$ docker-compose exec swh-scheduler-api \ | ~/swh-environment/swh-docker-dev$ docker-compose exec swh-scheduler-api \ | ||||
swh-scheduler task list-pending swh-lister-gitlab-full | swh scheduler task list-pending swh-lister-gitlab-full | ||||
Found 1 swh-lister-gitlab-full tasks | Found 1 swh-lister-gitlab-full tasks | ||||
Task 12 | Task 12 | ||||
Next run: 2 minutes ago (2018-12-19 14:58:49+00:00) | Next run: 2 minutes ago (2018-12-19 14:58:49+00:00) | ||||
Interval: 90 days, 0:00:00 | Interval: 90 days, 0:00:00 | ||||
Type: swh-lister-gitlab-full | Type: swh-lister-gitlab-full | ||||
Policy: oneshot | Policy: oneshot | ||||
Args: | Args: | ||||
Keyword args: | Keyword args: | ||||
api_baseurl=https://0xacab.org/api/v4 | api_baseurl=https://0xacab.org/api/v4 | ||||
``` | ``` | ||||
To list all existing task types: | To list all existing task types: | ||||
``` | ``` | ||||
~/swh-environment/swh-docker-dev$ docker-compose exec swh-scheduler-api \ | ~/swh-environment/swh-docker-dev$ docker-compose exec swh-scheduler-api \ | ||||
swh-scheduler task-type list | swh scheduler task-type list | ||||
Known task types: | Known task types: | ||||
swh-loader-mount-dump-and-load-svn-repository: | swh-loader-mount-dump-and-load-svn-repository: | ||||
Loading svn repositories from svn dump | Loading svn repositories from svn dump | ||||
origin-update-svn: | origin-update-svn: | ||||
Create dump of a remote svn repository, mount it and load it | Create dump of a remote svn repository, mount it and load it | ||||
swh-deposit-archive-loading: | swh-deposit-archive-loading: | ||||
Loading deposit archive into swh through swh-loader-tar | Loading deposit archive into swh through swh-loader-tar | ||||
▲ Show 20 Lines • Show All 165 Lines • ▼ Show 20 Lines | |||||
4 nodes online. | 4 nodes online. | ||||
(swh) ~/swh-environment$ celery control -d loader@61704103668c pool_grow 3 | (swh) ~/swh-environment$ celery control -d loader@61704103668c pool_grow 3 | ||||
``` | ``` | ||||
And we can use the `swh-scheduler` command all the same: | And we can use the `swh-scheduler` command all the same: | ||||
``` | ``` | ||||
(swh) ~/swh-environment$ swh-scheduler task-type list | (swh) ~/swh-environment$ swh scheduler task-type list | ||||
Known task types: | Known task types: | ||||
indexer_fossology_license: | indexer_fossology_license: | ||||
Fossology license indexer task | Fossology license indexer task | ||||
indexer_mimetype: | indexer_mimetype: | ||||
Mimetype indexer task | Mimetype indexer task | ||||
[...] | [...] | ||||
``` | ``` | ||||
Show All 12 Lines | case "$shell" in | ||||
"zsh") | "zsh") | ||||
autocomplete_cmd=source_zsh | autocomplete_cmd=source_zsh | ||||
;; | ;; | ||||
*) | *) | ||||
autocomplete_cmd=source | autocomplete_cmd=source | ||||
;; | ;; | ||||
esac | esac | ||||
eval "$(_SWH_SCHEDULER_COMPLETE=$autocomplete_cmd swh-scheduler)" | eval "$(_SWH_COMPLETE=$autocomplete_cmd swh)" | ||||
export SWH_SCHEDULER_URL=http://127.0.0.1:5008/ | export SWH_SCHEDULER_URL=http://127.0.0.1:5008/ | ||||
export CELERY_BROKER_URL=amqp://127.0.0.1:5072/ | export CELERY_BROKER_URL=amqp://127.0.0.1:5072/ | ||||
export COMPOSE_FILE=~/swh-environment/swh-docker-dev/docker-compose.yml:~/swh-environment/swh-docker-dev/docker-compose.override.yml | export COMPOSE_FILE=~/swh-environment/swh-docker-dev/docker-compose.yml:~/swh-environment/swh-docker-dev/docker-compose.override.yml | ||||
alias doco=docker-compose | alias doco=docker-compose | ||||
function swhclean { | function swhclean { | ||||
find ~/swh-environment -type d -name __pycache__ -exec rm -rf {} \; | find ~/swh-environment -type d -name __pycache__ -exec rm -rf {} \; | ||||
find ~/swh-environment -type d -name .tox -exec rm -rf {} \; | find ~/swh-environment -type d -name .tox -exec rm -rf {} \; | ||||
find ~/swh-environment -type d -name .hypothesis -exec rm -rf {} \; | find ~/swh-environment -type d -name .hypothesis -exec rm -rf {} \; | ||||
} | } | ||||
EOF | EOF | ||||
``` | ``` | ||||
This postactivate script does: | This postactivate script does: | ||||
- install a shell completion handler for the swh-scheduler command, | - install a shell completion handler for the swh-scheduler command, | ||||
- preset a bunch of environment variables | - preset a bunch of environment variables | ||||
- `SWH_SCHEDULER_URL` so that you can just run `swh-scheduler` against the | - `SWH_SCHEDULER_URL` so that you can just run `swh scheduler` against the | ||||
scheduler API instance running in docker, without having to specify the | scheduler API instance running in docker, without having to specify the | ||||
endpoint URL, | endpoint URL, | ||||
- `CELERY_BROKER` so you can execute the `celery` tool (without cli options) | - `CELERY_BROKER` so you can execute the `celery` tool (without cli options) | ||||
against the rabbitmq server running in the docker environment, | against the rabbitmq server running in the docker environment, | ||||
- `COMPOSE_FILE` so you can run `docker-compose` from everywhere, | - `COMPOSE_FILE` so you can run `docker-compose` from everywhere, | ||||
Show All 21 Lines | ``` | ||||
listers@50ac2185c6c9: OK | listers@50ac2185c6c9: OK | ||||
loader@b164f9055637: OK | loader@b164f9055637: OK | ||||
indexer@33bc6067a5b8: OK | indexer@33bc6067a5b8: OK | ||||
``` | ``` | ||||
* List task-types: | * List task-types: | ||||
``` | ``` | ||||
(swh) ~/swh-environment$ swh-scheduler task-type list | (swh) ~/swh-environment$ swh scheduler task-type list | ||||
[...] | [...] | ||||
``` | ``` | ||||
* Get more info on a task type: | * Get more info on a task type: | ||||
``` | ``` | ||||
(swh) ~/swh-environment$ swh-scheduler task-type list -v -t origin-update-hg | (swh) ~/swh-environment$ swh scheduler task-type list -v -t origin-update-hg | ||||
Known task types: | Known task types: | ||||
origin-update-hg: swh.loader.mercurial.tasks.LoadMercurial | origin-update-hg: swh.loader.mercurial.tasks.LoadMercurial | ||||
Loading mercurial repository swh-loader-mercurial | Loading mercurial repository swh-loader-mercurial | ||||
interval: 1 day, 0:00:00 [1 day, 0:00:00, 1 day, 0:00:00] | interval: 1 day, 0:00:00 [1 day, 0:00:00, 1 day, 0:00:00] | ||||
backoff_factor: 1.0 | backoff_factor: 1.0 | ||||
max_queue_length: 1000 | max_queue_length: 1000 | ||||
num_retries: None | num_retries: None | ||||
retry_delay: None | retry_delay: None | ||||
``` | ``` | ||||
* Add a new task: | * Add a new task: | ||||
``` | ``` | ||||
(swh) ~/swh-environment$ swh-scheduler task add origin-update-hg \ | (swh) ~/swh-environment$ swh scheduler task add origin-update-hg \ | ||||
origin_url=https://hg.logilab.org/master/cubicweb | origin_url=https://hg.logilab.org/master/cubicweb | ||||
Created 1 tasks | Created 1 tasks | ||||
Task 1 | Task 1 | ||||
Next run: just now (2019-02-06 12:36:58+00:00) | Next run: just now (2019-02-06 12:36:58+00:00) | ||||
Interval: 1 day, 0:00:00 | Interval: 1 day, 0:00:00 | ||||
Type: origin-update-hg | Type: origin-update-hg | ||||
Policy: recurring | Policy: recurring | ||||
Args: | Args: | ||||
Keyword args: | Keyword args: | ||||
origin_url: https://hg.logilab.org/master/cubicweb | origin_url: https://hg.logilab.org/master/cubicweb | ||||
``` | ``` | ||||
* Respawn a task: | * Respawn a task: | ||||
``` | ``` | ||||
(swh) ~/swh-environment$ swh-scheduler task respawn 1 | (swh) ~/swh-environment$ swh scheduler task respawn 1 | ||||
``` | ``` | ||||
## Starting a kafka-powered replica of the storage | ## Starting a kafka-powered replica of the storage | ||||
This repo comes with an optional `docker-compose.storage-replica.yml` | This repo comes with an optional `docker-compose.storage-replica.yml` | ||||
docker compose file that can be used to test the kafka-powered replication | docker compose file that can be used to test the kafka-powered replication | ||||
mecanism for the main storage. | mecanism for the main storage. | ||||
▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines |