Changeset View
Changeset View
Standalone View
Standalone View
README.md
Show First 20 Lines • Show All 102 Lines • ▼ Show 20 Lines | |||||
http://localhost:5080/ in your web browser. | http://localhost:5080/ in your web browser. | ||||
At this point, the archive is empty and needs to be filled with some content. | At this point, the archive is empty and needs to be filled with some content. | ||||
To do so, you can create tasks that will scrape a forge. For example, to inject | To do so, you can create tasks that will scrape a forge. For example, to inject | ||||
the code from the https://0xacab.org gitlab forge: | the code from the https://0xacab.org gitlab forge: | ||||
``` | ``` | ||||
~/swh-environment/swh-docker-dev$ docker-compose exec swh-scheduler-api \ | ~/swh-environment/swh-docker-dev$ docker-compose exec swh-scheduler-api \ | ||||
swh scheduler task add swh-lister-gitlab-full \ | swh scheduler task add list-gitlab-full \ | ||||
-p oneshot api_baseurl=https://0xacab.org/api/v4 | -p oneshot api_baseurl=https://0xacab.org/api/v4 | ||||
Created 1 tasks | Created 1 tasks | ||||
Task 1 | Task 1 | ||||
Next run: just now (2018-12-19 14:58:49+00:00) | Next run: just now (2018-12-19 14:58:49+00:00) | ||||
Interval: 90 days, 0:00:00 | Interval: 90 days, 0:00:00 | ||||
Type: swh-lister-gitlab-full | Type: list-gitlab-full | ||||
Policy: oneshot | Policy: oneshot | ||||
Args: | Args: | ||||
Keyword args: | Keyword args: | ||||
api_baseurl=https://0xacab.org/api/v4 | api_baseurl=https://0xacab.org/api/v4 | ||||
``` | ``` | ||||
This task will scrape the forge's project list and create subtasks to inject | This task will scrape the forge's project list and create subtasks to inject | ||||
each git repository found there. | each git repository found there. | ||||
▲ Show 20 Lines • Show All 148 Lines • ▼ Show 20 Lines | |||||
Then, for each repository, a new task will be created to ingest this repository | Then, for each repository, a new task will be created to ingest this repository | ||||
and keep it up to date. | and keep it up to date. | ||||
For example, to add a (one shot) task that will list git repos on the | For example, to add a (one shot) task that will list git repos on the | ||||
0xacab.org gitlab instance, one can do (from this git repository): | 0xacab.org gitlab instance, one can do (from this git repository): | ||||
``` | ``` | ||||
~/swh-environment/swh-docker-dev$ docker-compose exec swh-scheduler-api \ | ~/swh-environment/swh-docker-dev$ docker-compose exec swh-scheduler-api \ | ||||
swh scheduler task add swh-lister-gitlab-full \ | swh scheduler task add list-gitlab-full \ | ||||
-p oneshot api_baseurl=https://0xacab.org/api/v4 | -p oneshot api_baseurl=https://0xacab.org/api/v4 | ||||
Created 1 tasks | Created 1 tasks | ||||
Task 12 | Task 12 | ||||
Next run: just now (2018-12-19 14:58:49+00:00) | Next run: just now (2018-12-19 14:58:49+00:00) | ||||
Interval: 90 days, 0:00:00 | Interval: 90 days, 0:00:00 | ||||
Type: swh-lister-gitlab-full | Type: list-gitlab-full | ||||
Policy: oneshot | Policy: oneshot | ||||
Args: | Args: | ||||
Keyword args: | Keyword args: | ||||
api_baseurl=https://0xacab.org/api/v4 | api_baseurl=https://0xacab.org/api/v4 | ||||
``` | ``` | ||||
This will insert a new task in the scheduler. To list existing tasks for a | This will insert a new task in the scheduler. To list existing tasks for a | ||||
given task type: | given task type: | ||||
``` | ``` | ||||
~/swh-environment/swh-docker-dev$ docker-compose exec swh-scheduler-api \ | ~/swh-environment/swh-docker-dev$ docker-compose exec swh-scheduler-api \ | ||||
swh scheduler task list-pending swh-lister-gitlab-full | swh scheduler task list-pending list-gitlab-full | ||||
Found 1 swh-lister-gitlab-full tasks | Found 1 list-gitlab-full tasks | ||||
Task 12 | Task 12 | ||||
Next run: 2 minutes ago (2018-12-19 14:58:49+00:00) | Next run: 2 minutes ago (2018-12-19 14:58:49+00:00) | ||||
Interval: 90 days, 0:00:00 | Interval: 90 days, 0:00:00 | ||||
Type: swh-lister-gitlab-full | Type: list-gitlab-full | ||||
Policy: oneshot | Policy: oneshot | ||||
Args: | Args: | ||||
Keyword args: | Keyword args: | ||||
api_baseurl=https://0xacab.org/api/v4 | api_baseurl=https://0xacab.org/api/v4 | ||||
``` | ``` | ||||
To list all existing task types: | To list all existing task types: | ||||
``` | ``` | ||||
~/swh-environment/swh-docker-dev$ docker-compose exec swh-scheduler-api \ | ~/swh-environment/swh-docker-dev$ docker-compose exec swh-scheduler-api \ | ||||
swh scheduler task-type list | swh scheduler task-type list | ||||
Known task types: | Known task types: | ||||
swh-loader-mount-dump-and-load-svn-repository: | load-svn-from-archive: | ||||
Loading svn repositories from svn dump | Loading svn repositories from svn dump | ||||
origin-update-svn: | load-svn: | ||||
Create dump of a remote svn repository, mount it and load it | Create dump of a remote svn repository, mount it and load it | ||||
swh-deposit-archive-loading: | load-deposit: | ||||
ardumont: load-deposit | |||||
Loading deposit archive into swh through swh-loader-tar | Loading deposit archive into swh through swh-loader-tar | ||||
swh-deposit-archive-checks: | check-deposit: | ||||
Not Done Inline Actionscheck-deposit ardumont: check-deposit | |||||
Pre-checking deposit step before loading into swh archive | Pre-checking deposit step before loading into swh archive | ||||
swh-vault-cooking: | cook-vault-bundle: | ||||
Cook a Vault bundle | Cook a Vault bundle | ||||
origin-update-hg: | load-hg: | ||||
Loading mercurial repository swh-loader-mercurial | Loading mercurial repository swh-loader-mercurial | ||||
origin-load-archive-hg: | load-hg-from-archive: | ||||
Loading archive mercurial repository swh-loader-mercurial | Loading archive mercurial repository swh-loader-mercurial | ||||
origin-update-git: | load-git: | ||||
Update an origin of type git | Update an origin of type git | ||||
swh-lister-github-incremental: | list-github-incremental: | ||||
Incrementally list GitHub | Incrementally list GitHub | ||||
swh-lister-github-full: | list-github-full: | ||||
Full update of GitHub repos list | Full update of GitHub repos list | ||||
swh-lister-debian: | list-debian-distribution: | ||||
List a Debian distribution | List a Debian distribution | ||||
swh-lister-gitlab-incremental: | list-gitlab-incremental: | ||||
Incrementally list a Gitlab instance | Incrementally list a Gitlab instance | ||||
swh-lister-gitlab-full: | list-gitlab-full: | ||||
Full update of a Gitlab instance's repos list | Full update of a Gitlab instance's repos list | ||||
swh-lister-pypi: | list-pypi: | ||||
Full pypi lister | Full pypi lister | ||||
origin-update-pypi: | load-pypi: | ||||
Load Pypi origin | Load Pypi origin | ||||
indexer_mimetype: | index-mimetype: | ||||
Mimetype indexer task | Mimetype indexer task | ||||
indexer_range_mimetype: | index-mimetype-for-range: | ||||
Mimetype Range indexer task | Mimetype Range indexer task | ||||
indexer_fossology_license: | index-fossology-license: | ||||
Fossology license indexer task | Fossology license indexer task | ||||
indexer_range_fossology_license: | index-fossology-license-for-range: | ||||
Fossology license range indexer task | Fossology license range indexer task | ||||
indexer_origin_head: | index-origin-head: | ||||
Origin Head indexer task | Origin Head indexer task | ||||
indexer_revision_metadata: | index-revision-metadata: | ||||
Revision Metadata indexer task | Revision Metadata indexer task | ||||
indexer_origin_metadata: | index-origin-metadata: | ||||
Origin Metadata indexer task | Origin Metadata indexer task | ||||
``` | ``` | ||||
### Monitoring activity | ### Monitoring activity | ||||
You can monitor the workers activity by connecting to the RabbitMQ console on | You can monitor the workers activity by connecting to the RabbitMQ console on | ||||
▲ Show 20 Lines • Show All 122 Lines • ▼ Show 20 Lines | |||||
(swh) ~/swh-environment$ celery control -d loader@61704103668c pool_grow 3 | (swh) ~/swh-environment$ celery control -d loader@61704103668c pool_grow 3 | ||||
``` | ``` | ||||
And we can use the `swh-scheduler` command all the same: | And we can use the `swh-scheduler` command all the same: | ||||
``` | ``` | ||||
(swh) ~/swh-environment$ swh scheduler task-type list | (swh) ~/swh-environment$ swh scheduler task-type list | ||||
Known task types: | Known task types: | ||||
indexer_fossology_license: | index-fossology-license: | ||||
Fossology license indexer task | Fossology license indexer task | ||||
indexer_mimetype: | index-mimetype: | ||||
Mimetype indexer task | Mimetype indexer task | ||||
[...] | [...] | ||||
``` | ``` | ||||
### Make your life a bit easier | ### Make your life a bit easier | ||||
When you use virtualenvwrapper, you can add postactivation commands: | When you use virtualenvwrapper, you can add postactivation commands: | ||||
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines | |||||
``` | ``` | ||||
(swh) ~/swh-environment$ swh scheduler task-type list | (swh) ~/swh-environment$ swh scheduler task-type list | ||||
[...] | [...] | ||||
``` | ``` | ||||
* Get more info on a task type: | * Get more info on a task type: | ||||
``` | ``` | ||||
(swh) ~/swh-environment$ swh scheduler task-type list -v -t origin-update-hg | (swh) ~/swh-environment$ swh scheduler task-type list -v -t load-hg | ||||
Known task types: | Known task types: | ||||
origin-update-hg: swh.loader.mercurial.tasks.LoadMercurial | load-hg: swh.loader.mercurial.tasks.LoadMercurial | ||||
Loading mercurial repository swh-loader-mercurial | Loading mercurial repository swh-loader-mercurial | ||||
interval: 1 day, 0:00:00 [1 day, 0:00:00, 1 day, 0:00:00] | interval: 1 day, 0:00:00 [1 day, 0:00:00, 1 day, 0:00:00] | ||||
backoff_factor: 1.0 | backoff_factor: 1.0 | ||||
max_queue_length: 1000 | max_queue_length: 1000 | ||||
num_retries: None | num_retries: None | ||||
retry_delay: None | retry_delay: None | ||||
``` | ``` | ||||
* Add a new task: | * Add a new task: | ||||
``` | ``` | ||||
(swh) ~/swh-environment$ swh scheduler task add origin-update-hg \ | (swh) ~/swh-environment$ swh scheduler task add load-hg \ | ||||
origin_url=https://hg.logilab.org/master/cubicweb | origin_url=https://hg.logilab.org/master/cubicweb | ||||
Created 1 tasks | Created 1 tasks | ||||
Task 1 | Task 1 | ||||
Next run: just now (2019-02-06 12:36:58+00:00) | Next run: just now (2019-02-06 12:36:58+00:00) | ||||
Interval: 1 day, 0:00:00 | Interval: 1 day, 0:00:00 | ||||
Type: origin-update-hg | Type: load-hg | ||||
Policy: recurring | Policy: recurring | ||||
Args: | Args: | ||||
Keyword args: | Keyword args: | ||||
origin_url: https://hg.logilab.org/master/cubicweb | origin_url: https://hg.logilab.org/master/cubicweb | ||||
``` | ``` | ||||
* Respawn a task: | * Respawn a task: | ||||
▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines |
load-deposit