diff --git a/docs/run_a_new_lister.rst b/docs/run_a_new_lister.rst new file mode 100644 --- /dev/null +++ b/docs/run_a_new_lister.rst @@ -0,0 +1,90 @@ + +:orphan: + +.. _run-lister-tutorial: + +Tutorial: run a lister within docker-dev in just a few steps +===================================================================== + +It is a good practice to run your new lister in docker-dev. This provides an almost +production-like environment. Testing the lister in docker dev prior to deployment +reduces the chances of encountering errors when turning it for production. +Here are the steps you need to follow to run a lister within your local environment. + + +1. You must edit the docker-compose override file (`docker-compose.override.yml`). + following the sample provided :: + + version: '2' + + services: + swh-lister: + volumes: + - "$SWH_ENVIRONMENT_HOME/swh-lister:/src/swh-lister" + + The file named `docker-compose.override.yml` will automatically be loaded by + ``docker-compose``.Having an override makes it possible to run a docker container + with some swh packages installed from sources instead of using the latest + published packages from pypi. For more details, you may refer to README.md + present in ``swh-docker-dev``. +2. Follow the instruction mentioned under heading **Preparation steps** and + **Configuration file sample** in README.md of swh-lister. +3. Add in the lister configuration the new ``task_modules`` and ``task_queues`` + entry for the your new lister. You need to amend the conf/lister.yml file to + add the entries. Here is an example for GNU lister:: + + celery: + task_broker: amqp://guest:guest@amqp// + task_modules: + ... + - swh.lister.gnu.tasks + task_queues: + ... + - swh.lister.gnu.tasks.GNUListerTask + +4. Make sure to run ``storage (5002)`` and ``scheduler (5008)`` services locally. + You may use the following command to run docker:: + + ~/swh-environment/swh-docker-dev$ docker-compose up -d + +5. Add the lister task-type in the scheduler. For example, if you want to + add pypi lister task-type :: + + ~/swh-environment$ swh scheduler task-type add list-gnu-full \ + "swh.lister.gnu.tasks.GNUListerTask" "Full GNU lister" \ + --default-interval '1 day' --backoff-factor 1 + + You can check all the task-type by:: + + ~/swh-environment$swh scheduler task-type list + Known task types: + list-bitbucket-incremental: + Incrementally list BitBucket + list-cran: + Full CRAN Lister + list-debian-distribution: + List a Debian distribution + list-github-full: + Full update of GitHub repos list + list-github-incremental: + ... + + If your lister is creating new loading task not yet registered, you need + to register that task type as well. + +6. Run your lister with the help of scheduler cli. You need to add the task in + the scheduler using its cli. For example, you need to execute this command + to run gnu lister :: + + ~/swh-environment$ swh scheduler --url http://localhost:5008/ task add \ + list-gnu-full --policy oneshot + +After the execution of lister is complete, you can see the loading task created:: + + ~/swh-environment/swh-lister$ swh scheduler task list + +You can also check the repositories listed by the lister from the database in +which the lister output is stored. To connect to the database:: + + ~/swh-environment/swh-docker-dev$ docker-compose exec swh-lister bash -c \ + 'psql swh-listers' diff --git a/docs/tutorial.rst b/docs/tutorial.rst --- a/docs/tutorial.rst +++ b/docs/tutorial.rst @@ -56,18 +56,18 @@ 3. Populate a work queue for fetching and ingesting source repositories. Steps 1 and 3 are generic problems, so they can get generic solutions hidden -away in base code, most of which never needs to change. That leaves us to -implement step 2, which can be trivially done now for services with clean web +away in the base code, most of which never needs to change. That leaves us to +implement step 2, which can be trivially done now for services with a clean web APIs. -In the new code we've tried to hide away as much generic functionality as +In the new code, we've tried to hide away as much generic functionality as possible, turning it into set-and-forget plumbing between a few simple customized elements. Different hosting services might use different network protocols, rate-limit messages, or pagination schemes, but, as long as there is some way to get a list of the hosted repositories, we think that the new base code will make getting those repositories much easier. -First let me give you the 30,000 foot view… +First, let me give you the 30,000 foot view… The old GitHub-specific lister code looked like this (265 lines of Python): @@ -164,6 +164,13 @@ looks much simpler when we look at the actual implementations of the two new-style indexing listers we currently have… +When developing a new lister, it's important to test. For this, add the tests +(check `swh/lister/*/tests/`) and register the celery tasks in the main +conftest.py (`swh/lister/core/tests/conftest.py`). + +Another important step is to actually run it within the +docker-dev (:ref:`run-lister-tutorial`). + This is the entire source code for the BitBucket repository lister:: # Copyright (C) 2017 the Software Heritage developers