diff --git a/README.md b/README.md index f70d1ff..79e96e1 100644 --- a/README.md +++ b/README.md @@ -1,253 +1,100 @@ swh-lister ========== This component from the Software Heritage stack aims to produce listings of software origins and their urls hosted on various public developer platforms or package managers. As these operations are quite similar, it provides a set of Python modules abstracting common software origins listing behaviors. It also provides several lister implementations, contained in the following Python modules: - `swh.lister.bitbucket` - `swh.lister.cgit` - `swh.lister.cran` - `swh.lister.debian` - `swh.lister.gitea` - `swh.lister.github` - `swh.lister.gitlab` - `swh.lister.gnu` - `swh.lister.launchpad` - `swh.lister.npm` - `swh.lister.packagist` - `swh.lister.phabricator` - `swh.lister.pypi` Dependencies ------------ All required dependencies can be found in the `requirements*.txt` files located at the root of the repository. Local deployment ---------------- ## lister configuration -Each lister implemented so far by Software Heritage (`github`, `gitlab`, `debian`, `pypi`, `npm`) +Each lister implemented so far by Software Heritage (`bitbucket`, `cgit`, `cran`, `debian`, +`gitea`, `github`, `gitlab`, `gnu`, `launchpad`, `npm`, `packagist`, `phabricator`, `pypi`) must be configured by following the instructions below (please note that you have to replace `` by one of the lister name introduced above). ### Preparation steps -1. `mkdir ~/.config/swh/ ~/.cache/swh/lister//` -2. create configuration file `~/.config/swh/lister_.yml` -3. Bootstrap the db instance schema - -```lang=bash -$ createdb lister- -$ python3 -m swh.lister.cli --db-url postgres:///lister- -``` - -Note: This bootstraps a minimum data set needed for the lister to run. +1. `mkdir ~/.config/swh/` +2. create configuration file `~/.config/swh/listers.yml` ### Configuration file sample -Minimalistic configuration shared by all listers to add in file `~/.config/swh/lister_.yml`: +Minimalistic configuration shared by all listers to add in file `~/.config/swh/listers.yml`: ```lang=yml -storage: - cls: 'remote' - args: - url: 'http://localhost:5002/' - scheduler: cls: 'remote' args: url: 'http://localhost:5008/' -lister: - cls: 'local' - args: - # see http://docs.sqlalchemy.org/en/latest/core/engines.html#database-urls - db: 'postgresql:///lister-' - -credentials: [] -cache_responses: True -cache_dir: /home/user/.cache/swh/lister// -``` - -Note: This expects storage (5002) and scheduler (5008) services to run locally - -## lister-github - -Once configured, you can execute a GitHub lister using the following instructions in a `python3` script: - -```lang=python -import logging -from swh.lister.github.tasks import range_github_lister - -logging.basicConfig(level=logging.DEBUG) -range_github_lister(364, 365) -... -``` - -## lister-gitlab - -Once configured, you can execute a GitLab lister using the instructions detailed in the `python3` scripts below: - -```lang=python -import logging -from swh.lister.gitlab.tasks import range_gitlab_lister - -logging.basicConfig(level=logging.DEBUG) -range_gitlab_lister(1, 2, { - 'instance': 'debian', - 'api_baseurl': 'https://salsa.debian.org/api/v4', - 'sort': 'asc', - 'per_page': 20 -}) -``` - -```lang=python -import logging -from swh.lister.gitlab.tasks import full_gitlab_relister - -logging.basicConfig(level=logging.DEBUG) -full_gitlab_relister({ - 'instance': '0xacab', - 'api_baseurl': 'https://0xacab.org/api/v4', - 'sort': 'asc', - 'per_page': 20 -}) -``` - -```lang=python -import logging -from swh.lister.gitlab.tasks import incremental_gitlab_lister - -logging.basicConfig(level=logging.DEBUG) -incremental_gitlab_lister({ - 'instance': 'freedesktop.org', - 'api_baseurl': 'https://gitlab.freedesktop.org/api/v4', - 'sort': 'asc', - 'per_page': 20 -}) -``` - -## lister-debian - -Once configured, you can execute a Debian lister using the following instructions in a `python3` script: - -```lang=python -import logging -from swh.lister.debian.tasks import debian_lister - -logging.basicConfig(level=logging.DEBUG) -debian_lister('Debian') -``` - -## lister-pypi - -Once configured, you can execute a PyPI lister using the following instructions in a `python3` script: - -```lang=python -import logging -from swh.lister.pypi.tasks import pypi_lister - -logging.basicConfig(level=logging.DEBUG) -pypi_lister() -``` - -## lister-npm - -Once configured, you can execute a npm lister using the following instructions in a `python3` REPL: - -```lang=python -import logging -from swh.lister.npm.tasks import npm_lister - -logging.basicConfig(level=logging.DEBUG) -npm_lister() +credentials: {} ``` -## lister-phabricator +Note: This expects scheduler (5008) service to run locally -Once configured, you can execute a Phabricator lister using the following instructions in a `python3` script: +## Executing a lister -```lang=python -import logging -from swh.lister.phabricator.tasks import incremental_phabricator_lister +Once configured, a lister can be executed by using the `swh` CLI tool with the +following options and commands: -logging.basicConfig(level=logging.DEBUG) -incremental_phabricator_lister(forge_url='https://forge.softwareheritage.org', api_token='XXXX') ``` - -## lister-gnu - -Once configured, you can execute a PyPI lister using the following instructions in a `python3` script: - -```lang=python -import logging -from swh.lister.gnu.tasks import gnu_lister - -logging.basicConfig(level=logging.DEBUG) -gnu_lister() +$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister [lister_parameters] ``` -## lister-cran +Examples: -Once configured, you can execute a CRAN lister using the following instructions in a `python3` script: - -```lang=python -import logging -from swh.lister.cran.tasks import cran_lister - -logging.basicConfig(level=logging.DEBUG) -cran_lister() ``` +$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister bitbucket -## lister-cgit - -Once configured, you can execute a cgit lister using the following instructions -in a `python3` script: - -```lang=python -import logging -from swh.lister.cgit.tasks import cgit_lister - -logging.basicConfig(level=logging.DEBUG) -# simple cgit instance -cgit_lister(url='https://git.kernel.org/') -# cgit instance whose listed repositories differ from the base url -cgit_lister(url='https://cgit.kde.org/', - url_prefix='https://anongit.kde.org/') -``` +$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister cran -## lister-packagist +$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister gitea url=https://codeberg.org/api/v1/ -Once configured, you can execute a Packagist lister using the following instructions -in a `python3` script: +$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister gitlab url=https://salsa.debian.org/api/v4/ -```lang=python -import logging -from swh.lister.packagist.tasks import packagist_lister +$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister npm -logging.basicConfig(level=logging.DEBUG) -packagist_lister() +$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister pypi ``` Licensing --------- This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. See top-level LICENSE file for the full text of the GNU General Public License along with this program. diff --git a/docs/run_a_new_lister.rst b/docs/run_a_new_lister.rst index 66ad176..4054810 100644 --- a/docs/run_a_new_lister.rst +++ b/docs/run_a_new_lister.rst @@ -1,87 +1,87 @@ .. _run-lister-tutorial: Tutorial: run a lister within docker-dev in just a few steps ===================================================================== It is a good practice to run your new lister in docker-dev. This provides an almost production-like environment. Testing the lister in docker dev prior to deployment reduces the chances of encountering errors when turning it for production. Here are the steps you need to follow to run a lister within your local environment. 1. You must edit the docker-compose override file (`docker-compose.override.yml`). following the sample provided :: version: '2' services: swh-lister: volumes: - "$SWH_ENVIRONMENT_HOME/swh-lister:/src/swh-lister" The file named `docker-compose.override.yml` will automatically be loaded by ``docker-compose``.Having an override makes it possible to run a docker container with some swh packages installed from sources instead of using the latest published packages from pypi. For more details, you may refer to README.md present in ``swh-docker-dev``. 2. Follow the instruction mentioned under heading **Preparation steps** and **Configuration file sample** in README.md of swh-lister. 3. Add in the lister configuration the new ``task_modules`` and ``task_queues`` entry for the your new lister. You need to amend the conf/lister.yml file to add the entries. Here is an example for GNU lister:: celery: task_broker: amqp://guest:guest@amqp// task_modules: ... - swh.lister.gnu.tasks task_queues: ... - swh.lister.gnu.tasks.GNUListerTask 4. Make sure to run ``storage (5002)`` and ``scheduler (5008)`` services locally. You may use the following command to run docker:: ~/swh-environment/swh-docker-dev$ docker-compose up -d 5. Add the lister task-type in the scheduler. For example, if you want to add pypi lister task-type :: ~/swh-environment$ swh scheduler task-type add list-gnu-full \ "swh.lister.gnu.tasks.GNUListerTask" "Full GNU lister" \ --default-interval '1 day' --backoff-factor 1 You can check all the task-type by:: ~/swh-environment$swh scheduler task-type list Known task types: list-bitbucket-incremental: Incrementally list BitBucket list-cran: Full CRAN Lister list-debian-distribution: List a Debian distribution list-github-full: Full update of GitHub repos list list-github-incremental: ... If your lister is creating new loading task not yet registered, you need to register that task type as well. 6. Run your lister with the help of scheduler cli. You need to add the task in the scheduler using its cli. For example, you need to execute this command to run gnu lister :: ~/swh-environment$ swh scheduler --url http://localhost:5008/ task add \ list-gnu-full --policy oneshot After the execution of lister is complete, you can see the loading task created:: ~/swh-environment/swh-lister$ swh scheduler task list -You can also check the repositories listed by the lister from the database in -which the lister output is stored. To connect to the database:: +You can also check the repositories listed by the lister from the scheduler database +in which the lister output is stored. To connect to the database:: - ~/swh-environment/swh-docker-dev$ docker-compose exec swh-lister bash -c \ - 'psql swh-listers' + ~/swh-environment/docker$ docker-compose exec swh-scheduler bash -c \ + 'psql swh-scheduler -c "select url from listed_origins"'