Page MenuHomeSoftware Heritage

No OneTemporary

diff --git a/README.md b/README.md
index f70d1ff..79e96e1 100644
--- a/README.md
+++ b/README.md
@@ -1,253 +1,100 @@
swh-lister
==========
This component from the Software Heritage stack aims to produce listings
of software origins and their urls hosted on various public developer platforms
or package managers. As these operations are quite similar, it provides a set of
Python modules abstracting common software origins listing behaviors.
It also provides several lister implementations, contained in the
following Python modules:
- `swh.lister.bitbucket`
- `swh.lister.cgit`
- `swh.lister.cran`
- `swh.lister.debian`
- `swh.lister.gitea`
- `swh.lister.github`
- `swh.lister.gitlab`
- `swh.lister.gnu`
- `swh.lister.launchpad`
- `swh.lister.npm`
- `swh.lister.packagist`
- `swh.lister.phabricator`
- `swh.lister.pypi`
Dependencies
------------
All required dependencies can be found in the `requirements*.txt` files located
at the root of the repository.
Local deployment
----------------
## lister configuration
-Each lister implemented so far by Software Heritage (`github`, `gitlab`, `debian`, `pypi`, `npm`)
+Each lister implemented so far by Software Heritage (`bitbucket`, `cgit`, `cran`, `debian`,
+`gitea`, `github`, `gitlab`, `gnu`, `launchpad`, `npm`, `packagist`, `phabricator`, `pypi`)
must be configured by following the instructions below (please note that you have to replace
`<lister_name>` by one of the lister name introduced above).
### Preparation steps
-1. `mkdir ~/.config/swh/ ~/.cache/swh/lister/<lister_name>/`
-2. create configuration file `~/.config/swh/lister_<lister_name>.yml`
-3. Bootstrap the db instance schema
-
-```lang=bash
-$ createdb lister-<lister_name>
-$ python3 -m swh.lister.cli --db-url postgres:///lister-<lister_name> <lister_name>
-```
-
-Note: This bootstraps a minimum data set needed for the lister to run.
+1. `mkdir ~/.config/swh/`
+2. create configuration file `~/.config/swh/listers.yml`
### Configuration file sample
-Minimalistic configuration shared by all listers to add in file `~/.config/swh/lister_<lister_name>.yml`:
+Minimalistic configuration shared by all listers to add in file `~/.config/swh/listers.yml`:
```lang=yml
-storage:
- cls: 'remote'
- args:
- url: 'http://localhost:5002/'
-
scheduler:
cls: 'remote'
args:
url: 'http://localhost:5008/'
-lister:
- cls: 'local'
- args:
- # see http://docs.sqlalchemy.org/en/latest/core/engines.html#database-urls
- db: 'postgresql:///lister-<lister_name>'
-
-credentials: []
-cache_responses: True
-cache_dir: /home/user/.cache/swh/lister/<lister_name>/
-```
-
-Note: This expects storage (5002) and scheduler (5008) services to run locally
-
-## lister-github
-
-Once configured, you can execute a GitHub lister using the following instructions in a `python3` script:
-
-```lang=python
-import logging
-from swh.lister.github.tasks import range_github_lister
-
-logging.basicConfig(level=logging.DEBUG)
-range_github_lister(364, 365)
-...
-```
-
-## lister-gitlab
-
-Once configured, you can execute a GitLab lister using the instructions detailed in the `python3` scripts below:
-
-```lang=python
-import logging
-from swh.lister.gitlab.tasks import range_gitlab_lister
-
-logging.basicConfig(level=logging.DEBUG)
-range_gitlab_lister(1, 2, {
- 'instance': 'debian',
- 'api_baseurl': 'https://salsa.debian.org/api/v4',
- 'sort': 'asc',
- 'per_page': 20
-})
-```
-
-```lang=python
-import logging
-from swh.lister.gitlab.tasks import full_gitlab_relister
-
-logging.basicConfig(level=logging.DEBUG)
-full_gitlab_relister({
- 'instance': '0xacab',
- 'api_baseurl': 'https://0xacab.org/api/v4',
- 'sort': 'asc',
- 'per_page': 20
-})
-```
-
-```lang=python
-import logging
-from swh.lister.gitlab.tasks import incremental_gitlab_lister
-
-logging.basicConfig(level=logging.DEBUG)
-incremental_gitlab_lister({
- 'instance': 'freedesktop.org',
- 'api_baseurl': 'https://gitlab.freedesktop.org/api/v4',
- 'sort': 'asc',
- 'per_page': 20
-})
-```
-
-## lister-debian
-
-Once configured, you can execute a Debian lister using the following instructions in a `python3` script:
-
-```lang=python
-import logging
-from swh.lister.debian.tasks import debian_lister
-
-logging.basicConfig(level=logging.DEBUG)
-debian_lister('Debian')
-```
-
-## lister-pypi
-
-Once configured, you can execute a PyPI lister using the following instructions in a `python3` script:
-
-```lang=python
-import logging
-from swh.lister.pypi.tasks import pypi_lister
-
-logging.basicConfig(level=logging.DEBUG)
-pypi_lister()
-```
-
-## lister-npm
-
-Once configured, you can execute a npm lister using the following instructions in a `python3` REPL:
-
-```lang=python
-import logging
-from swh.lister.npm.tasks import npm_lister
-
-logging.basicConfig(level=logging.DEBUG)
-npm_lister()
+credentials: {}
```
-## lister-phabricator
+Note: This expects scheduler (5008) service to run locally
-Once configured, you can execute a Phabricator lister using the following instructions in a `python3` script:
+## Executing a lister
-```lang=python
-import logging
-from swh.lister.phabricator.tasks import incremental_phabricator_lister
+Once configured, a lister can be executed by using the `swh` CLI tool with the
+following options and commands:
-logging.basicConfig(level=logging.DEBUG)
-incremental_phabricator_lister(forge_url='https://forge.softwareheritage.org', api_token='XXXX')
```
-
-## lister-gnu
-
-Once configured, you can execute a PyPI lister using the following instructions in a `python3` script:
-
-```lang=python
-import logging
-from swh.lister.gnu.tasks import gnu_lister
-
-logging.basicConfig(level=logging.DEBUG)
-gnu_lister()
+$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister <lister_name> [lister_parameters]
```
-## lister-cran
+Examples:
-Once configured, you can execute a CRAN lister using the following instructions in a `python3` script:
-
-```lang=python
-import logging
-from swh.lister.cran.tasks import cran_lister
-
-logging.basicConfig(level=logging.DEBUG)
-cran_lister()
```
+$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister bitbucket
-## lister-cgit
-
-Once configured, you can execute a cgit lister using the following instructions
-in a `python3` script:
-
-```lang=python
-import logging
-from swh.lister.cgit.tasks import cgit_lister
-
-logging.basicConfig(level=logging.DEBUG)
-# simple cgit instance
-cgit_lister(url='https://git.kernel.org/')
-# cgit instance whose listed repositories differ from the base url
-cgit_lister(url='https://cgit.kde.org/',
- url_prefix='https://anongit.kde.org/')
-```
+$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister cran
-## lister-packagist
+$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister gitea url=https://codeberg.org/api/v1/
-Once configured, you can execute a Packagist lister using the following instructions
-in a `python3` script:
+$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister gitlab url=https://salsa.debian.org/api/v4/
-```lang=python
-import logging
-from swh.lister.packagist.tasks import packagist_lister
+$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister npm
-logging.basicConfig(level=logging.DEBUG)
-packagist_lister()
+$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister pypi
```
Licensing
---------
This program is free software: you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation, either version 3 of the License, or (at your option) any later
version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the GNU General Public License for more details.
See top-level LICENSE file for the full text of the GNU General Public License
along with this program.
diff --git a/docs/run_a_new_lister.rst b/docs/run_a_new_lister.rst
index 66ad176..4054810 100644
--- a/docs/run_a_new_lister.rst
+++ b/docs/run_a_new_lister.rst
@@ -1,87 +1,87 @@
.. _run-lister-tutorial:
Tutorial: run a lister within docker-dev in just a few steps
=====================================================================
It is a good practice to run your new lister in docker-dev. This provides an almost
production-like environment. Testing the lister in docker dev prior to deployment
reduces the chances of encountering errors when turning it for production.
Here are the steps you need to follow to run a lister within your local environment.
1. You must edit the docker-compose override file (`docker-compose.override.yml`).
following the sample provided ::
version: '2'
services:
swh-lister:
volumes:
- "$SWH_ENVIRONMENT_HOME/swh-lister:/src/swh-lister"
The file named `docker-compose.override.yml` will automatically be loaded by
``docker-compose``.Having an override makes it possible to run a docker container
with some swh packages installed from sources instead of using the latest
published packages from pypi. For more details, you may refer to README.md
present in ``swh-docker-dev``.
2. Follow the instruction mentioned under heading **Preparation steps** and
**Configuration file sample** in README.md of swh-lister.
3. Add in the lister configuration the new ``task_modules`` and ``task_queues``
entry for the your new lister. You need to amend the conf/lister.yml file to
add the entries. Here is an example for GNU lister::
celery:
task_broker: amqp://guest:guest@amqp//
task_modules:
...
- swh.lister.gnu.tasks
task_queues:
...
- swh.lister.gnu.tasks.GNUListerTask
4. Make sure to run ``storage (5002)`` and ``scheduler (5008)`` services locally.
You may use the following command to run docker::
~/swh-environment/swh-docker-dev$ docker-compose up -d
5. Add the lister task-type in the scheduler. For example, if you want to
add pypi lister task-type ::
~/swh-environment$ swh scheduler task-type add list-gnu-full \
"swh.lister.gnu.tasks.GNUListerTask" "Full GNU lister" \
--default-interval '1 day' --backoff-factor 1
You can check all the task-type by::
~/swh-environment$swh scheduler task-type list
Known task types:
list-bitbucket-incremental:
Incrementally list BitBucket
list-cran:
Full CRAN Lister
list-debian-distribution:
List a Debian distribution
list-github-full:
Full update of GitHub repos list
list-github-incremental:
...
If your lister is creating new loading task not yet registered, you need
to register that task type as well.
6. Run your lister with the help of scheduler cli. You need to add the task in
the scheduler using its cli. For example, you need to execute this command
to run gnu lister ::
~/swh-environment$ swh scheduler --url http://localhost:5008/ task add \
list-gnu-full --policy oneshot
After the execution of lister is complete, you can see the loading task created::
~/swh-environment/swh-lister$ swh scheduler task list
-You can also check the repositories listed by the lister from the database in
-which the lister output is stored. To connect to the database::
+You can also check the repositories listed by the lister from the scheduler database
+in which the lister output is stored. To connect to the database::
- ~/swh-environment/swh-docker-dev$ docker-compose exec swh-lister bash -c \
- 'psql swh-listers'
+ ~/swh-environment/docker$ docker-compose exec swh-scheduler bash -c \
+ 'psql swh-scheduler -c "select url from listed_origins"'

File Metadata

Mime Type
text/x-diff
Expires
Jun 4 2025, 6:52 PM (11 w, 4 d ago)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3276978

Event Timeline