diff --git a/README.md b/README.md --- a/README.md +++ b/README.md @@ -1,192 +1,167 @@ -SWH-lister -============ +swh-lister +========== -The Software Heritage Lister is both a library module to permit to -centralize lister behaviors, and to provide lister implementations. +This component from the Software Heritage stack aims to produce listings +of software origins and their urls hosted on various public developer platforms +or package managers. As these operations are quite similar, it provides a set of +Python modules abstracting common software origins listing behaviors. -Actual lister implementations are: - -- swh-lister-bitbucket -- swh-lister-debian -- swh-lister-github -- swh-lister-gitlab -- swh-lister-pypi - -Licensing ----------- - -This program is free software: you can redistribute it and/or modify it under -the terms of the GNU General Public License as published by the Free Software -Foundation, either version 3 of the License, or (at your option) any later -version. - -This program is distributed in the hope that it will be useful, but WITHOUT ANY -WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A -PARTICULAR PURPOSE. See the GNU General Public License for more details. - -See top-level LICENSE file for the full text of the GNU General Public License -along with this program. +It also provides several lister implementations, contained in the +following Python modules: +- `swh.lister.bitbucket` +- `swh.lister.debian` +- `swh.lister.github` +- `swh.lister.gitlab` +- `swh.lister.pypi` +- `swh.lister.npm` Dependencies ------------ -- python3 -- python3-requests -- python3-sqlalchemy - -More details in requirements*.txt - +All required dependencies can be found in the `requirements*.txt` files located +at the root of the repository. Local deployment ------------ - -## lister-github - -### Preparation steps - -1. git clone under $SWH_ENVIRONMENT_HOME/swh-lister (of your choosing) -2. mkdir ~/.config/swh/ ~/.cache/swh/lister/github.com/ -3. create configuration file ~/.config/swh/lister-github.com.yml -4. Bootstrap the db instance schema - - $ createdb lister-github - $ python3 -m swh.lister.cli --db-url postgres:///lister-github github - -### Configuration file sample - -Minimalistic configuration: - - $ cat ~/.config/swh/lister-github.com.yml - # see http://docs.sqlalchemy.org/en/latest/core/engines.html#database-urls - lister_db_url: postgres:///lister-github - credentials: [] - cache_responses: True - cache_dir: /home/user/.cache/swh/lister/github.com - -Note: This expects storage (5002) and scheduler (5008) services to run locally - -### Run +---------------- - $ python3 - >>> import logging - >>> logging.basicConfig(level=logging.DEBUG) - >>> from swh.lister.github.tasks import range_github_lister; range_github_lister(364, 365) - INFO:root:listing repos starting at 364 - DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.github.com - DEBUG:urllib3.connectionpool:https://api.github.com:443 "GET /repositories?since=364 HTTP/1.1" 200 None - DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost - DEBUG:urllib3.connectionpool:http://localhost:5002 "POST /origin/add HTTP/1.1" 200 1 +## lister configuration +Each lister implemented so far by Software Heritage (`github`, `gitlab`, `debian`, `pypi`, `npm`) +must be configured by following the instructions below (please note that you have to replace +`` by one of the lister name introduced above). -## lister-gitlab +### Preparation steps -### preparation steps +1. `mkdir ~/.config/swh/ ~/.cache/swh/lister//` +2. create configuration file `~/.config/swh/lister_.yml` +3. Bootstrap the db instance schema -1. git clone under $SWH_ENVIRONMENT_HOME/swh-lister (of your choosing) -2. mkdir ~/.config/swh/ ~/.cache/swh/lister/gitlab/ -3. create configuration file ~/.config/swh/lister-gitlab.yml -4. Bootstrap the db instance schema +```lang=bash +$ createdb lister- +$ python3 -m swh.lister.cli --db-url postgres:///lister- +``` - $ createdb lister-gitlab - $ python3 -m swh.lister.cli --db-url postgres:///lister-gitlab gitlab +Note: This bootstraps a minimum data set needed for the lister to run. ### Configuration file sample - $ cat ~/.config/swh/lister-gitlab.yml - # see http://docs.sqlalchemy.org/en/latest/core/engines.html#database-urls - lister_db_url: postgres:///lister-gitlab - credentials: [] - cache_responses: True - cache_dir: /home/user/.cache/swh/lister/gitlab - -Note: This expects storage (5002) and scheduler (5008) services to run locally +Minimalistic configuration shared by all listers to add in file `~/.config/swh/lister_.yml`: -### Run +```lang=yml +storage: + cls: 'remote' + args: + url: 'http://localhost:5002/' - $ python3 - Python 3.6.6 (default, Jun 27 2018, 14:44:17) - [GCC 8.1.0] on linux - Type "help", "copyright", "credits" or "license" for more information. - >>> from swh.lister.gitlab.tasks import range_gitlab_lister; range_gitlab_lister(1, 2, - {'instance': 'debian', 'api_baseurl': 'https://salsa.debian.org/api/v4', 'sort': 'asc', 'per_page': 20}) - >>> from swh.lister.gitlab.tasks import full_gitlab_relister; full_gitlab_relister( - {'instance':'0xacab', 'api_baseurl':'https://0xacab.org/api/v4', 'sort': 'asc', 'per_page': 20}) - >>> from swh.lister.gitlab.tasks import incremental_gitlab_lister; incremental_gitlab_lister( - {'instance': 'freedesktop.org', 'api_baseurl': 'https://gitlab.freedesktop.org/api/v4', - 'sort': 'asc', 'per_page': 20}) +scheduler: + cls: 'remote' + args: + url: 'http://localhost:5008/' -## lister-debian +lister: + cls: 'local' + args: + # see http://docs.sqlalchemy.org/en/latest/core/engines.html#database-urls + db: 'postgresql:///lister-' -### preparation steps +credentials: [] +cache_responses: True +cache_dir: /home/user/.cache/swh/lister// +``` -1. git clone under $SWH_ENVIRONMENT_HOME/swh-lister (of your choosing) -2. mkdir ~/.config/swh/ ~/.cache/swh/lister/debian/ -3. create configuration file ~/.config/swh/lister-debian.yml -4. Bootstrap the db instance schema +Note: This expects storage (5002) and scheduler (5008) services to run locally - $ createdb lister-debian - $ python3 -m swh.lister.cli --db-url postgres:///lister-debian debian - - Note: This bootstraps a minimum data set needed for the debian - lister to run (for development) +## lister-github -### Configuration file sample +Once configured, you can execute a GitHub lister using the following instructions in a `python3` REPL: + +```lang=python +>>> import logging +>>> logging.basicConfig(level=logging.DEBUG) +>>> from swh.lister.github.tasks import range_github_lister +>>> range_github_lister(364, 365) +INFO:root:listing repos starting at 364 +DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.github.com +DEBUG:urllib3.connectionpool:https://api.github.com:443 "GET /repositories?since=364 HTTP/1.1" 200 None +DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost +DEBUG:urllib3.connectionpool:http://localhost:5002 "POST /origin/add HTTP/1.1" 200 1 +... +``` - $ cat ~/.config/swh/lister-debian.yml - # see http://docs.sqlalchemy.org/en/latest/core/engines.html#database-urls - lister_db_url: postgres:///lister-debian - credentials: [] - cache_responses: True - cache_dir: /home/user/.cache/swh/lister/debian +## lister-gitlab -Note: This expects storage (5002) and scheduler (5008) services to run locally +Once configured, you can execute a GitLab lister using the following instructions in a `python3` REPL: + +```lang=python +>>> from swh.lister.gitlab.tasks import range_gitlab_lister +>>> range_gitlab_lister(1, 2, { +... 'instance': 'debian', +... 'api_baseurl': 'https://salsa.debian.org/api/v4', +... 'sort': 'asc', +... 'per_page': 20 +... }) +>>> from swh.lister.gitlab.tasks import full_gitlab_relister +>>> full_gitlab_relister({ +... 'instance': '0xacab', +... 'api_baseurl': 'https://0xacab.org/api/v4', +... 'sort': 'asc', +... 'per_page': 20 +... }) +>>> from swh.lister.gitlab.tasks import incremental_gitlab_lister +>>> incremental_gitlab_lister({ +... 'instance': 'freedesktop.org', +... 'api_baseurl': 'https://gitlab.freedesktop.org/api/v4', +... 'sort': 'asc', +... 'per_page': 20 +... }) +``` -### Run +## lister-debian - $ python3 - Python 3.6.6 (default, Jun 27 2018, 14:44:17) - [GCC 8.1.0] on linux - Type "help", "copyright", "credits" or "license" for more information. - >>> import logging; logging.basicConfig(level=logging.DEBUG); from swh.lister.debian.tasks import debian_lister; debian_lister('Debian') - DEBUG:root:Creating snapshot for distribution Distribution(Debian (deb) on http://deb.debian.org/debian/) on date 2018-07-27 09:22:50.461165+00:00 - DEBUG:root:Processing area Area(stretch/main of Debian) - DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): deb.debian.org - DEBUG:urllib3.connectionpool:http://deb.debian.org:80 "GET /debian//dists/stretch/main/source/Sources.xz HTTP/1.1" 302 325 - ... +Once configured, you can execute a Debian lister using the following instructions in a `python3` REPL: +```lang=python +>>> import logging; +>>> logging.basicConfig(level=logging.DEBUG) +>>> from swh.lister.debian.tasks import debian_lister +>>> debian_lister('Debian') +DEBUG:root:Creating snapshot for distribution Distribution(Debian (deb) on http://deb.debian.org/debian/) on date 2018-07-27 09:22:50.461165+00:00 +DEBUG:root:Processing area Area(stretch/main of Debian) +DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): deb.debian.org +DEBUG:urllib3.connectionpool:http://deb.debian.org:80 "GET /debian//dists/stretch/main/source/Sources.xz HTTP/1.1" 302 325 +... +``` ## lister-pypi -### preparation steps +Once configured, you can execute a PyPI lister using the following instructions in a `python3` REPL: -1. git clone under $SWH_ENVIRONMENT_HOME/swh-lister (of your choosing) -2. mkdir ~/.config/swh/ ~/.cache/swh/lister/pypi/ -3. create configuration file ~/.config/swh/lister-pypi.yml -4. Bootstrap the db instance schema +```lang=python +>>> from swh.lister.pypi.tasks import pypi_lister +>>> pypi_lister() +``` - $ createdb lister-pypi - $ python3 -m swh.lister.cli --db-url postgres:///lister-pypi pypi +## lister-npm - Note: This bootstraps a minimum data set needed for the pypi - lister to run (for development) +Once configured, you can execute a npm lister using the following instructions in a `python3` REPL: -### Configuration file sample +```lang=python +>>> from swh.lister.pypi.tasks import pypi_lister +>>> pypi_lister() +``` - $ cat ~/.config/swh/lister-pypi.yml - # see http://docs.sqlalchemy.org/en/latest/core/engines.html#database-urls - lister_db_url: postgres:///lister-pypi - credentials: [] - cache_responses: True - cache_dir: /home/user/.cache/swh/lister/pypi +Licensing +---------- -Note: This expects storage (5002) and scheduler (5008) services to run locally +This program is free software: you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free Software +Foundation, either version 3 of the License, or (at your option) any later +version. -### Run +This program is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A +PARTICULAR PURPOSE. See the GNU General Public License for more details. - $ python3 - Python 3.6.6 (default, Jun 27 2018, 14:44:17) - [GCC 8.1.0] on linux - Type "help", "copyright", "credits" or "license" for more information. - >>> from swh.lister.pypi.tasks import pypi_lister; pypi_lister() - >>> +See top-level LICENSE file for the full text of the GNU General Public License +along with this program. \ No newline at end of file diff --git a/bin/ghlister b/bin/ghlister --- a/bin/ghlister +++ b/bin/ghlister @@ -77,8 +77,6 @@ args = parse_args() override_conf = {} - if args.db_url: - override_conf['lister_db_url'] = args.db_url lister = GitHubLister(lister_name='github.com', api_baseurl='https://api.github.com', diff --git a/swh/lister/cli.py b/swh/lister/cli.py --- a/swh/lister/cli.py +++ b/swh/lister/cli.py @@ -26,7 +26,6 @@ """ override_conf = { - 'lister_db_url': db_url, 'lister': { 'cls': 'local', 'args': {'db': db_url}