diff --git a/README.md b/README.md --- a/README.md +++ b/README.md @@ -1,192 +1,180 @@ -SWH-lister -============ +swh-lister +========== -The Software Heritage Lister is both a library module to permit to -centralize lister behaviors, and to provide lister implementations. +This component from the Software Heritage stack aims to produce listings +of software origins and their urls hosted on various public developer platforms +or package managers. As these operations are quite similar, it provides a set of +Python modules abstracting common software origins listing behaviors. -Actual lister implementations are: - -- swh-lister-bitbucket -- swh-lister-debian -- swh-lister-github -- swh-lister-gitlab -- swh-lister-pypi - -Licensing ----------- - -This program is free software: you can redistribute it and/or modify it under -the terms of the GNU General Public License as published by the Free Software -Foundation, either version 3 of the License, or (at your option) any later -version. - -This program is distributed in the hope that it will be useful, but WITHOUT ANY -WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A -PARTICULAR PURPOSE. See the GNU General Public License for more details. - -See top-level LICENSE file for the full text of the GNU General Public License -along with this program. +It also provides several lister implementations, contained in the +following Python modules: +- `swh.lister.bitbucket` +- `swh.lister.debian` +- `swh.lister.github` +- `swh.lister.gitlab` +- `swh.lister.pypi` +- `swh.lister.npm` Dependencies ------------ -- python3 -- python3-requests -- python3-sqlalchemy - -More details in requirements*.txt - +All required dependencies can be found in the `requirements*.txt` files located +at the root of the repository. Local deployment ------------ - -## lister-github - -### Preparation steps - -1. git clone under $SWH_ENVIRONMENT_HOME/swh-lister (of your choosing) -2. mkdir ~/.config/swh/ ~/.cache/swh/lister/github.com/ -3. create configuration file ~/.config/swh/lister-github.com.yml -4. Bootstrap the db instance schema +---------------- - $ createdb lister-github - $ python3 -m swh.lister.cli --db-url postgres:///lister-github github +## lister configuration -### Configuration file sample - -Minimalistic configuration: - - $ cat ~/.config/swh/lister-github.com.yml - # see http://docs.sqlalchemy.org/en/latest/core/engines.html#database-urls - lister_db_url: postgres:///lister-github - credentials: [] - cache_responses: True - cache_dir: /home/user/.cache/swh/lister/github.com - -Note: This expects storage (5002) and scheduler (5008) services to run locally +Each lister implemented so far by Software Heritage (`github`, `gitlab`, `debian`, `pypi`, `npm`) +must be configured by following the instructions below (please note that you have to replace +`` by one of the lister name introduced above). -### Run +### Preparation steps - $ python3 - >>> import logging - >>> logging.basicConfig(level=logging.DEBUG) - >>> from swh.lister.github.tasks import range_github_lister; range_github_lister(364, 365) - INFO:root:listing repos starting at 364 - DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.github.com - DEBUG:urllib3.connectionpool:https://api.github.com:443 "GET /repositories?since=364 HTTP/1.1" 200 None - DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost - DEBUG:urllib3.connectionpool:http://localhost:5002 "POST /origin/add HTTP/1.1" 200 1 +1. `mkdir ~/.config/swh/ ~/.cache/swh/lister//` +2. create configuration file `~/.config/swh/lister_.yml` +3. Bootstrap the db instance schema +```lang=bash +$ createdb lister- +$ python3 -m swh.lister.cli --db-url postgres:///lister- +``` -## lister-gitlab +Note: This bootstraps a minimum data set needed for the lister to run. -### preparation steps +### Configuration file sample -1. git clone under $SWH_ENVIRONMENT_HOME/swh-lister (of your choosing) -2. mkdir ~/.config/swh/ ~/.cache/swh/lister/gitlab/ -3. create configuration file ~/.config/swh/lister-gitlab.yml -4. Bootstrap the db instance schema +Minimalistic configuration shared by all listers to add in file `~/.config/swh/lister_.yml`: - $ createdb lister-gitlab - $ python3 -m swh.lister.cli --db-url postgres:///lister-gitlab gitlab +```lang=yml +storage: + cls: 'remote' + args: + url: 'http://localhost:5002/' -### Configuration file sample +scheduler: + cls: 'remote' + args: + url: 'http://localhost:5008/' - $ cat ~/.config/swh/lister-gitlab.yml +lister: + cls: 'local' + args: # see http://docs.sqlalchemy.org/en/latest/core/engines.html#database-urls - lister_db_url: postgres:///lister-gitlab - credentials: [] - cache_responses: True - cache_dir: /home/user/.cache/swh/lister/gitlab + db: 'postgresql:///lister-' + +credentials: [] +cache_responses: True +cache_dir: /home/user/.cache/swh/lister// +``` Note: This expects storage (5002) and scheduler (5008) services to run locally -### Run +## lister-github - $ python3 - Python 3.6.6 (default, Jun 27 2018, 14:44:17) - [GCC 8.1.0] on linux - Type "help", "copyright", "credits" or "license" for more information. - >>> from swh.lister.gitlab.tasks import range_gitlab_lister; range_gitlab_lister(1, 2, - {'instance': 'debian', 'api_baseurl': 'https://salsa.debian.org/api/v4', 'sort': 'asc', 'per_page': 20}) - >>> from swh.lister.gitlab.tasks import full_gitlab_relister; full_gitlab_relister( - {'instance':'0xacab', 'api_baseurl':'https://0xacab.org/api/v4', 'sort': 'asc', 'per_page': 20}) - >>> from swh.lister.gitlab.tasks import incremental_gitlab_lister; incremental_gitlab_lister( - {'instance': 'freedesktop.org', 'api_baseurl': 'https://gitlab.freedesktop.org/api/v4', - 'sort': 'asc', 'per_page': 20}) +Once configured, you can execute a GitHub lister using the following instructions in a `python3` script: -## lister-debian +```lang=python +import logging +from swh.lister.github.tasks import range_github_lister -### preparation steps +logging.basicConfig(level=logging.DEBUG) +range_github_lister(364, 365) +... +``` -1. git clone under $SWH_ENVIRONMENT_HOME/swh-lister (of your choosing) -2. mkdir ~/.config/swh/ ~/.cache/swh/lister/debian/ -3. create configuration file ~/.config/swh/lister-debian.yml -4. Bootstrap the db instance schema +## lister-gitlab - $ createdb lister-debian - $ python3 -m swh.lister.cli --db-url postgres:///lister-debian debian - - Note: This bootstraps a minimum data set needed for the debian - lister to run (for development) +Once configured, you can execute a GitLab lister using the instructions detailed in the `python3` scripts below: + +```lang=python +import logging +from swh.lister.gitlab.tasks import range_gitlab_lister + +logging.basicConfig(level=logging.DEBUG) +range_gitlab_lister(1, 2, { + 'instance': 'debian', + 'api_baseurl': 'https://salsa.debian.org/api/v4', + 'sort': 'asc', + 'per_page': 20 +}) +``` + +```lang=python +import logging +from swh.lister.gitlab.tasks import full_gitlab_relister + +logging.basicConfig(level=logging.DEBUG) +full_gitlab_relister({ + 'instance': '0xacab', + 'api_baseurl': 'https://0xacab.org/api/v4', + 'sort': 'asc', + 'per_page': 20 +}) +``` + +```lang=python +import logging +from swh.lister.gitlab.tasks import incremental_gitlab_lister + +logging.basicConfig(level=logging.DEBUG) +incremental_gitlab_lister({ + 'instance': 'freedesktop.org', + 'api_baseurl': 'https://gitlab.freedesktop.org/api/v4', + 'sort': 'asc', + 'per_page': 20 +}) +``` -### Configuration file sample +## lister-debian - $ cat ~/.config/swh/lister-debian.yml - # see http://docs.sqlalchemy.org/en/latest/core/engines.html#database-urls - lister_db_url: postgres:///lister-debian - credentials: [] - cache_responses: True - cache_dir: /home/user/.cache/swh/lister/debian +Once configured, you can execute a Debian lister using the following instructions in a `python3` script: -Note: This expects storage (5002) and scheduler (5008) services to run locally +```lang=python +import logging +from swh.lister.debian.tasks import debian_lister -### Run +logging.basicConfig(level=logging.DEBUG) +debian_lister('Debian') +``` - $ python3 - Python 3.6.6 (default, Jun 27 2018, 14:44:17) - [GCC 8.1.0] on linux - Type "help", "copyright", "credits" or "license" for more information. - >>> import logging; logging.basicConfig(level=logging.DEBUG); from swh.lister.debian.tasks import debian_lister; debian_lister('Debian') - DEBUG:root:Creating snapshot for distribution Distribution(Debian (deb) on http://deb.debian.org/debian/) on date 2018-07-27 09:22:50.461165+00:00 - DEBUG:root:Processing area Area(stretch/main of Debian) - DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): deb.debian.org - DEBUG:urllib3.connectionpool:http://deb.debian.org:80 "GET /debian//dists/stretch/main/source/Sources.xz HTTP/1.1" 302 325 - ... +## lister-pypi +Once configured, you can execute a PyPI lister using the following instructions in a `python3` script: -## lister-pypi +```lang=python +import logging +from swh.lister.pypi.tasks import pypi_lister -### preparation steps +logging.basicConfig(level=logging.DEBUG) +pypi_lister() +``` -1. git clone under $SWH_ENVIRONMENT_HOME/swh-lister (of your choosing) -2. mkdir ~/.config/swh/ ~/.cache/swh/lister/pypi/ -3. create configuration file ~/.config/swh/lister-pypi.yml -4. Bootstrap the db instance schema +## lister-npm - $ createdb lister-pypi - $ python3 -m swh.lister.cli --db-url postgres:///lister-pypi pypi +Once configured, you can execute a npm lister using the following instructions in a `python3` REPL: - Note: This bootstraps a minimum data set needed for the pypi - lister to run (for development) +```lang=python +import logging +from swh.lister.npm.tasks import npm_lister -### Configuration file sample +logging.basicConfig(level=logging.DEBUG) +npm_lister() +``` - $ cat ~/.config/swh/lister-pypi.yml - # see http://docs.sqlalchemy.org/en/latest/core/engines.html#database-urls - lister_db_url: postgres:///lister-pypi - credentials: [] - cache_responses: True - cache_dir: /home/user/.cache/swh/lister/pypi +Licensing +--------- -Note: This expects storage (5002) and scheduler (5008) services to run locally +This program is free software: you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free Software +Foundation, either version 3 of the License, or (at your option) any later +version. -### Run +This program is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A +PARTICULAR PURPOSE. See the GNU General Public License for more details. - $ python3 - Python 3.6.6 (default, Jun 27 2018, 14:44:17) - [GCC 8.1.0] on linux - Type "help", "copyright", "credits" or "license" for more information. - >>> from swh.lister.pypi.tasks import pypi_lister; pypi_lister() - >>> +See top-level LICENSE file for the full text of the GNU General Public License +along with this program. \ No newline at end of file diff --git a/bin/ghlister b/bin/ghlister --- a/bin/ghlister +++ b/bin/ghlister @@ -77,8 +77,6 @@ args = parse_args() override_conf = {} - if args.db_url: - override_conf['lister_db_url'] = args.db_url lister = GitHubLister(lister_name='github.com', api_baseurl='https://api.github.com', diff --git a/swh/lister/cli.py b/swh/lister/cli.py --- a/swh/lister/cli.py +++ b/swh/lister/cli.py @@ -26,7 +26,6 @@ """ override_conf = { - 'lister_db_url': db_url, 'lister': { 'cls': 'local', 'args': {'db': db_url}