diff --git a/docs/tutorial.rst b/docs/tutorial.rst
--- a/docs/tutorial.rst
+++ b/docs/tutorial.rst
@@ -14,7 +14,7 @@
archived by Software Heritage
`_
through a combination of automatic linkage between the listing and loading
-scheduler, new understanding of how to deal with extremely large repository
+scheduler, a new understanding of how to deal with an extremely large repository
hosts like `GitHub `_, and activating a new set of
repositories that had previously been skipped over.
@@ -29,7 +29,7 @@
archive. As the old Italian proverb goes, "Il meglio è nemico del bene," or in
modern English parlance, "Perfect is the enemy of good," right? Right. So the
plan from the beginning was to implement a lister for GitHub, then maybe
-implement another one, and then take a few giant steps backward and squint our
+implement another one, and then take a few giant steps backwards and squint our
eyes.
Why? Because source code hosting services don't behave according to a unified
@@ -56,18 +56,18 @@
3. Populate a work queue for fetching and ingesting source repositories.
Steps 1 and 3 are generic problems, so they can get generic solutions hidden
-away in base code, most of which never needs to change. That leaves us to
-implement step 2, which can be trivially done now for services with clean web
+away in the base code, most of which never needs to change. That leaves us to
+implement step 2, which can be trivially done now for services with a clean web
APIs.
-In the new code we've tried to hide away as much generic functionality as
+In the new code, we've tried to hide away as much generic functionality as
possible, turning it into set-and-forget plumbing between a few simple
customized elements. Different hosting services might use different network
protocols, rate-limit messages, or pagination schemes, but, as long as there is
some way to get a list of the hosted repositories, we think that the new base
code will make getting those repositories much easier.
-First let me give you the 30,000 foot view…
+First, let me give you the 30,000 foot view…
The old GitHub-specific lister code looked like this (265 lines of Python):
@@ -164,6 +164,84 @@
looks much simpler when we look at the actual implementations of the two
new-style indexing listers we currently have…
+An important aspect for making a new lister is its testing. To register the
+celery tasks of your new lister, you need to add your lister in the main
+conftest.py (swh/lister/core/tests/conftest.py)
+
+After testing, it is suggested to run your new lister in docker as it provides
+good, almost-production like test. Here are the steps you need to follow to run
+a new lister in docker.
+
+1. You must write a docker-compose override file (`docker-compose.override.yml`).
+ An example is given in the `docker-compose.override.yml.example` file ::
+
+ version: '2'
+
+ services:
+ swh-objstorage:
+ volumes:
+ - "$SWH_ENVIRONMENT_HOME/swh-lister:/src/swh-lister"
+
+ The file named `docker-compose.override.yml` will automatically be loaded by
+ `docker-compose`. For more details, you may refer to README.md present in
+ swh-docker-dev.
+2. Follow the instruction mentioned under heading Preparation steps and
+ Configuration file sample in README.md of swh-lister.
+3. Make sure to run storage (5002) and scheduler (5008) services locally.
+ You can run them by the following command::
+
+ ~/swh-environment/swh-docker-dev$ docker-compose up -d swh-scheduler-api \
+ swh-storage
+4. Add the lister task-type in the scheduler. For example, if you want to
+ add pypi lister task-type ::
+
+ ~/swh-environment$swh-scheduler task-type add list-pypi recurring \
+ "Full pypi lister"
+
+ You can check all the task-type by::
+
+ ~/swh-environment$swh scheduler task-type list
+ Known task types:
+ load-svn-from-archive:
+ Loading svn repositories from svn dump
+ load-svn:
+ Create dump of a remote svn repository, mount it and load it
+ load-deposit:
+ Loading deposit archive into swh through swh-loader-tar
+ check-deposit:
+ Pre-checking deposit step before loading into swh archive
+ cook-vault-bundle:
+ Cook a Vault bundle
+ load-hg:
+ Loading mercurial repository swh-loader-mercurial
+ load-hg-from-archive:
+ Loading archive mercurial repository swh-loader-mercurial
+ load-git:
+ Update an origin of type git
+ list-github-incremental:
+ Incrementally list GitHub
+ list-github-full:
+ Full update of GitHub repos list
+ ...
+
+ If your lister is creating a loading task that is not in the task type
+ list, then you need to add that too. Like for GNU lister::
+
+ ~/swh-environment$swh scheduler task-type add load-gnu recurring \
+ "GNU Loader"
+
+5. Run your lister by importing the lister task and executing it. For example
+ you need to run these lines to run pypi lister ::
+
+ import logging
+ from swh.lister.pypi.tasks import pypi_lister
+
+ logging.basicConfig(level=logging.DEBUG)
+ pypi_lister()
+
+After the execution of lister is complete you can see the loading task created.
+ ~/swh-environment/swh-lister$swh scheduler task list
+
This is the entire source code for the BitBucket repository lister::
# Copyright (C) 2017 the Software Heritage developers