Changeset View
Standalone View
swh/lister/phabricator/tasks.py
- This file was added.
# Copyright (C) 2019 the Software Heritage developers | |||||
# License: GNU General Public License version 3, or any later version | |||||
# See top-level LICENSE file for more information | |||||
from swh.scheduler.celery_backend.config import app | |||||
from swh.lister.phabricator.lister import PhabricatorLister | |||||
def new_lister( | |||||
forge_url='https://forge.softwareheritage.org', api_token='', **kw): | |||||
anlambert: I would call this parameter `forge_url` instead | |||||
return PhabricatorLister(forge_url=forge_url, api_token=api_token, **kw) | |||||
Done Inline Actionsthis url construction step should also be moved in the lister implementation anlambert: this url construction step should also be moved in the lister implementation | |||||
Done Inline ActionsFrom my point of view, the api url construction should be handled in the lister implementation and not in the task one anlambert: From my point of view, the api url construction should be handled in the lister implementation… | |||||
Done Inline ActionsI am not able to understand what to do with api token. How to pass it in PhabricatorLister class as there is only one parameter (api_url). nahimilega: I am not able to understand what to do with api token. How to pass it in PhabricatorLister… | |||||
Done Inline ActionsYou can override the lister constructor and add the api token as parameter. Check the npm lister implementation as an example. anlambert: You can override the lister constructor and add the api token as parameter. Check the npm… | |||||
Done Inline ActionsSomething like: class PhabricatorLister(SWHIndexingHttpLister): def __init__(self, forge_url, api_token, override_config=None): api_endpoint = 'api/diffusion.repository.search?api.token=%s' % api_token api_baseurl = '%s/%s' % (forge_url, api_endpoint) super().__init__(api_baseurl=api_baseurl, override_config=override_config) Also as we want to be able to list different phabricator instances. So we need to have a way to link the listed repositories to their corresponding Phabricator instance. You can check the GitLab lister implementation which can handle multiple instances and inspire yourself from it. anlambert: Something like:
```lang=python
class PhabricatorLister(SWHIndexingHttpLister):
def __init__… | |||||
@app.task(name=__name__ + '.IncrementalPhabricatorLister') | |||||
def incremental_phabricator_lister(**lister_args): | |||||
Done Inline ActionsI think we should also add a full phabricator lister task that simply lists all hosted repositories (similar to the npm lister which offers full and incremental listing). The task implementation could be the following: @app.task(name=__name__ + '.FullPhabricatorLister') def full_phabricator_lister(**lister_args): lister = new_lister(**lister_args) first_index = lister.first_repository_index() lister.run(min_bound=first_index, max_bound=None) anlambert: I think we should also add a full phabricator lister task that simply lists all hosted… | |||||
lister = new_lister(**lister_args) | |||||
lister.run(min_bound=lister.db_last_index()) | |||||
@app.task(name=__name__ + '.FullPhabricatorLister') | |||||
def full_phabricator_lister(**lister_args): | |||||
lister = new_lister(**lister_args) | |||||
lister.run() | |||||
Done Inline ActionsHere I stumbled across the issue that the first listing operation will fail as lister.db_last_index() will return None in that case. Also, the first repository index on a Phabricator instance is not necessarily 1 (for KDE it is 2 for instance). diff --git a/swh/lister/phabricator/lister.py b/swh/lister/phabricator/lister.py index 7e36038..3b95e37 100644 --- a/swh/lister/phabricator/lister.py +++ b/swh/lister/phabricator/lister.py @@ -8,7 +8,7 @@ from swh.lister.phabricator.models import PhabricatorModel class PhabricatorLister(SWHIndexingHttpLister): - PATH_TEMPLATE = '&after=%s' + PATH_TEMPLATE = '&order=oldest&attachments[uris]=1&after=%s' MODEL = PhabricatorModel LISTER_NAME = 'phabricator' @@ -40,3 +40,8 @@ class PhabricatorLister(SWHIndexingHttpLister): repos = response.json() repos = repos['result']['data'] return [self.get_model_from_repo(repo) for repo in repos] + + def first_repository_index(self): + params = '&order=oldest&limit=1' + response = self.safely_issue_request(params) + return response.json()['result']['data'][0]['id'] diff --git a/swh/lister/phabricator/tasks.py b/swh/lister/phabricator/tasks.py index d1e77ee..169df9f 100644 --- a/swh/lister/phabricator/tasks.py +++ b/swh/lister/phabricator/tasks.py @@ -22,7 +22,15 @@ def new_lister( @app.task(name=__name__ + '.IncrementalPhabricatorLister') def incremental_phabricator_lister(**lister_args): lister = new_lister(**lister_args) - lister.run(min_bound=lister.db_last_index(), max_bound=None) + last_index = lister.db_last_index() + # bootstrap repository listing + if last_index is None: + last_index = lister.first_repository_index() + # we need to ingest the first repo data as they will not be returned + # by the next request to the Phabricator API + lister.max_index = last_index + lister.ingest_data(last_index) + lister.run(min_bound=last_index, max_bound=None) anlambert: Here I stumbled across the issue that the first listing operation will fail as `lister. | |||||
@app.task(name=__name__ + '.ping') | |||||
def ping(): | |||||
return 'OK' | |||||
Done Inline ActionsDo we really need these two additional tasks ? Based on my understanding, the incremental one above seems enough to get all repositories hosted on a Phabricator instance, plus it will only get the new ones when executed again. anlambert: Do we really need these two additional tasks ? Based on my understanding, the incremental one… |
I would call this parameter forge_url instead