Page Menu
Home
Software Heritage
Search
Configure Global Search
Log In
Files
F9697232
No One
Temporary
Actions
View File
Edit File
Delete File
View Transforms
Subscribe
Mute Notifications
Award Token
Flag For Later
Size
29 KB
Subscribers
None
View Options
diff --git a/PKG-INFO b/PKG-INFO
index bc1d21c..d84f6a4 100644
--- a/PKG-INFO
+++ b/PKG-INFO
@@ -1,126 +1,126 @@
Metadata-Version: 2.1
Name: swh.lister
-Version: 1.7.0
+Version: 1.8.0
Summary: Software Heritage lister
Home-page: https://forge.softwareheritage.org/diffusion/DLSGH/
Author: Software Heritage developers
Author-email: swh-devel@inria.fr
License: UNKNOWN
Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest
Project-URL: Funding, https://www.softwareheritage.org/donate
Project-URL: Source, https://forge.softwareheritage.org/source/swh-lister
Project-URL: Documentation, https://docs.softwareheritage.org/devel/swh-lister/
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Development Status :: 5 - Production/Stable
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Provides-Extra: testing
License-File: LICENSE
swh-lister
==========
This component from the Software Heritage stack aims to produce listings
of software origins and their urls hosted on various public developer platforms
or package managers. As these operations are quite similar, it provides a set of
Python modules abstracting common software origins listing behaviors.
It also provides several lister implementations, contained in the
following Python modules:
- `swh.lister.bitbucket`
- `swh.lister.cgit`
- `swh.lister.cran`
- `swh.lister.debian`
- `swh.lister.gitea`
- `swh.lister.github`
- `swh.lister.gitlab`
- `swh.lister.gnu`
- `swh.lister.launchpad`
- `swh.lister.npm`
- `swh.lister.packagist`
- `swh.lister.phabricator`
- `swh.lister.pypi`
- `swh.lister.tuleap`
Dependencies
------------
All required dependencies can be found in the `requirements*.txt` files located
at the root of the repository.
Local deployment
----------------
## lister configuration
Each lister implemented so far by Software Heritage (`bitbucket`, `cgit`, `cran`, `debian`,
`gitea`, `github`, `gitlab`, `gnu`, `launchpad`, `npm`, `packagist`, `phabricator`, `pypi`, `tuleap`)
must be configured by following the instructions below (please note that you have to replace
`<lister_name>` by one of the lister name introduced above).
### Preparation steps
1. `mkdir ~/.config/swh/`
2. create configuration file `~/.config/swh/listers.yml`
### Configuration file sample
Minimalistic configuration shared by all listers to add in file `~/.config/swh/listers.yml`:
```lang=yml
scheduler:
cls: 'remote'
args:
url: 'http://localhost:5008/'
credentials: {}
```
Note: This expects scheduler (5008) service to run locally
## Executing a lister
Once configured, a lister can be executed by using the `swh` CLI tool with the
following options and commands:
```
$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister <lister_name> [lister_parameters]
```
Examples:
```
$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister bitbucket
$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister cran
$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister gitea url=https://codeberg.org/api/v1/
$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister gitlab url=https://salsa.debian.org/api/v4/
$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister npm
$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister pypi
```
Licensing
---------
This program is free software: you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation, either version 3 of the License, or (at your option) any later
version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the GNU General Public License for more details.
See top-level LICENSE file for the full text of the GNU General Public License
along with this program.
diff --git a/swh.lister.egg-info/PKG-INFO b/swh.lister.egg-info/PKG-INFO
index bc1d21c..d84f6a4 100644
--- a/swh.lister.egg-info/PKG-INFO
+++ b/swh.lister.egg-info/PKG-INFO
@@ -1,126 +1,126 @@
Metadata-Version: 2.1
Name: swh.lister
-Version: 1.7.0
+Version: 1.8.0
Summary: Software Heritage lister
Home-page: https://forge.softwareheritage.org/diffusion/DLSGH/
Author: Software Heritage developers
Author-email: swh-devel@inria.fr
License: UNKNOWN
Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest
Project-URL: Funding, https://www.softwareheritage.org/donate
Project-URL: Source, https://forge.softwareheritage.org/source/swh-lister
Project-URL: Documentation, https://docs.softwareheritage.org/devel/swh-lister/
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Development Status :: 5 - Production/Stable
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Provides-Extra: testing
License-File: LICENSE
swh-lister
==========
This component from the Software Heritage stack aims to produce listings
of software origins and their urls hosted on various public developer platforms
or package managers. As these operations are quite similar, it provides a set of
Python modules abstracting common software origins listing behaviors.
It also provides several lister implementations, contained in the
following Python modules:
- `swh.lister.bitbucket`
- `swh.lister.cgit`
- `swh.lister.cran`
- `swh.lister.debian`
- `swh.lister.gitea`
- `swh.lister.github`
- `swh.lister.gitlab`
- `swh.lister.gnu`
- `swh.lister.launchpad`
- `swh.lister.npm`
- `swh.lister.packagist`
- `swh.lister.phabricator`
- `swh.lister.pypi`
- `swh.lister.tuleap`
Dependencies
------------
All required dependencies can be found in the `requirements*.txt` files located
at the root of the repository.
Local deployment
----------------
## lister configuration
Each lister implemented so far by Software Heritage (`bitbucket`, `cgit`, `cran`, `debian`,
`gitea`, `github`, `gitlab`, `gnu`, `launchpad`, `npm`, `packagist`, `phabricator`, `pypi`, `tuleap`)
must be configured by following the instructions below (please note that you have to replace
`<lister_name>` by one of the lister name introduced above).
### Preparation steps
1. `mkdir ~/.config/swh/`
2. create configuration file `~/.config/swh/listers.yml`
### Configuration file sample
Minimalistic configuration shared by all listers to add in file `~/.config/swh/listers.yml`:
```lang=yml
scheduler:
cls: 'remote'
args:
url: 'http://localhost:5008/'
credentials: {}
```
Note: This expects scheduler (5008) service to run locally
## Executing a lister
Once configured, a lister can be executed by using the `swh` CLI tool with the
following options and commands:
```
$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister <lister_name> [lister_parameters]
```
Examples:
```
$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister bitbucket
$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister cran
$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister gitea url=https://codeberg.org/api/v1/
$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister gitlab url=https://salsa.debian.org/api/v4/
$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister npm
$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister pypi
```
Licensing
---------
This program is free software: you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation, either version 3 of the License, or (at your option) any later
version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the GNU General Public License for more details.
See top-level LICENSE file for the full text of the GNU General Public License
along with this program.
diff --git a/swh/lister/gitlab/lister.py b/swh/lister/gitlab/lister.py
index 4a98825..6c3f7e6 100644
--- a/swh/lister/gitlab/lister.py
+++ b/swh/lister/gitlab/lister.py
@@ -1,264 +1,265 @@
# Copyright (C) 2018-2021 The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU General Public License version 3, or any later version
# See top-level LICENSE file for more information
from dataclasses import asdict, dataclass
import logging
import random
from typing import Any, Dict, Iterator, Optional, Tuple
from urllib.parse import parse_qs, urlencode, urlparse
import iso8601
import requests
from requests.exceptions import HTTPError
from requests.status_codes import codes
from tenacity.before_sleep import before_sleep_log
from swh.lister import USER_AGENT
from swh.lister.pattern import CredentialsType, Lister
from swh.lister.utils import is_retryable_exception, retry_attempt, throttling_retry
from swh.scheduler.model import ListedOrigin
logger = logging.getLogger(__name__)
# Some instance provides hg_git type which can be ingested as hg origins
VCS_MAPPING = {"hg_git": "hg"}
@dataclass
class GitLabListerState:
"""State of the GitLabLister"""
last_seen_next_link: Optional[str] = None
"""Last link header (not visited yet) during an incremental pass
"""
Repository = Dict[str, Any]
@dataclass
class PageResult:
"""Result from a query to a gitlab project api page."""
repositories: Optional[Tuple[Repository, ...]] = None
next_page: Optional[str] = None
def _if_rate_limited(retry_state) -> bool:
"""Custom tenacity retry predicate for handling HTTP responses with status code 403
with specific ratelimit header.
"""
attempt = retry_attempt(retry_state)
if attempt.failed:
exc = attempt.exception()
return (
isinstance(exc, HTTPError)
and exc.response.status_code == codes.forbidden
and int(exc.response.headers.get("RateLimit-Remaining", "0")) == 0
) or is_retryable_exception(exc)
return False
def _parse_id_after(url: Optional[str]) -> Optional[int]:
"""Given an url, extract a return the 'id_after' query parameter associated value
or None.
This is the the repository id used for pagination purposes.
"""
if not url:
return None
# link: https://${project-api}/?...&id_after=2x...
query_data = parse_qs(urlparse(url).query)
page = query_data.get("id_after")
if page and len(page) > 0:
return int(page[0])
return None
class GitLabLister(Lister[GitLabListerState, PageResult]):
"""List origins for a gitlab instance.
By default, the lister runs in incremental mode: it lists all repositories,
starting with the `last_seen_next_link` stored in the scheduler backend.
Args:
scheduler: a scheduler instance
url: the api v4 url of the gitlab instance to visit (e.g.
https://gitlab.com/api/v4/)
instance: a specific instance name (e.g. gitlab, tor, git-kernel, ...),
url network location will be used if not provided
incremental: defines if incremental listing is activated or not
"""
- LISTER_NAME = "gitlab"
-
def __init__(
self,
scheduler,
url: str,
+ name: Optional[str] = "gitlab",
instance: Optional[str] = None,
credentials: Optional[CredentialsType] = None,
incremental: bool = False,
):
+ if name is not None:
+ self.LISTER_NAME = name
super().__init__(
scheduler=scheduler,
url=url.rstrip("/"),
instance=instance,
credentials=credentials,
)
self.incremental = incremental
self.last_page: Optional[str] = None
self.per_page = 100
self.session = requests.Session()
self.session.headers.update(
{"Accept": "application/json", "User-Agent": USER_AGENT}
)
if len(self.credentials) > 0:
cred = random.choice(self.credentials)
logger.info(
"Using %s credentials from user %s", self.instance, cred["username"]
)
api_token = cred["password"]
if api_token:
self.session.headers["Authorization"] = f"Bearer {api_token}"
def state_from_dict(self, d: Dict[str, Any]) -> GitLabListerState:
return GitLabListerState(**d)
def state_to_dict(self, state: GitLabListerState) -> Dict[str, Any]:
return asdict(state)
@throttling_retry(
retry=_if_rate_limited, before_sleep=before_sleep_log(logger, logging.WARNING)
)
def get_page_result(self, url: str) -> PageResult:
logger.debug("Fetching URL %s", url)
response = self.session.get(url)
if response.status_code != 200:
logger.warning(
"Unexpected HTTP status code %s on %s: %s",
response.status_code,
response.url,
response.content,
)
# GitLab API can return errors 500 when listing projects.
# https://gitlab.com/gitlab-org/gitlab/-/issues/262629
# To avoid ending the listing prematurely, skip buggy URLs and move
# to next pages.
if response.status_code == 500:
id_after = _parse_id_after(url)
assert id_after is not None
while True:
next_id_after = id_after + self.per_page
url = url.replace(f"id_after={id_after}", f"id_after={next_id_after}")
response = self.session.get(url)
if response.status_code == 200:
break
else:
id_after = next_id_after
else:
response.raise_for_status()
repositories: Tuple[Repository, ...] = tuple(response.json())
if hasattr(response, "links") and response.links.get("next"):
next_page = response.links["next"]["url"]
else:
next_page = None
return PageResult(repositories, next_page)
def page_url(self, id_after: Optional[int] = None) -> str:
parameters = {
"pagination": "keyset",
"order_by": "id",
"sort": "asc",
"simple": "true",
"per_page": f"{self.per_page}",
}
if id_after is not None:
parameters["id_after"] = str(id_after)
return f"{self.url}/projects?{urlencode(parameters)}"
def get_pages(self) -> Iterator[PageResult]:
next_page: Optional[str]
if self.incremental and self.state and self.state.last_seen_next_link:
next_page = self.state.last_seen_next_link
else:
next_page = self.page_url()
while next_page:
self.last_page = next_page
page_result = self.get_page_result(next_page)
yield page_result
next_page = page_result.next_page
def get_origins_from_page(self, page_result: PageResult) -> Iterator[ListedOrigin]:
assert self.lister_obj.id is not None
repositories = page_result.repositories if page_result.repositories else []
for repo in repositories:
visit_type = repo.get("vcs_type", "git")
visit_type = VCS_MAPPING.get(visit_type, visit_type)
yield ListedOrigin(
lister_id=self.lister_obj.id,
url=repo["http_url_to_repo"],
visit_type=visit_type,
last_update=iso8601.parse_date(repo["last_activity_at"]),
)
def commit_page(self, page_result: PageResult) -> None:
"""Update currently stored state using the latest listed "next" page if relevant.
Relevancy is determined by the next_page link whose 'page' id must be strictly
superior to the currently stored one.
Note: this is a noop for full listing mode
"""
if self.incremental:
# link: https://${project-api}/?...&page=2x...
next_page = page_result.next_page
if not next_page and self.last_page:
next_page = self.last_page
if next_page:
id_after = _parse_id_after(next_page)
previous_next_page = self.state.last_seen_next_link
previous_id_after = _parse_id_after(previous_next_page)
if previous_next_page is None or (
previous_id_after and id_after and previous_id_after < id_after
):
self.state.last_seen_next_link = next_page
def finalize(self) -> None:
"""finalize the lister state when relevant (see `fn:commit_page` for details)
Note: this is a noop for full listing mode
"""
next_page = self.state.last_seen_next_link
if self.incremental and next_page:
# link: https://${project-api}/?...&page=2x...
next_id_after = _parse_id_after(next_page)
scheduler_state = self.get_state_from_scheduler()
previous_next_id_after = _parse_id_after(
scheduler_state.last_seen_next_link
)
if (not previous_next_id_after and next_id_after) or (
previous_next_id_after
and next_id_after
and previous_next_id_after < next_id_after
):
self.updated = True
diff --git a/swh/lister/gitlab/tests/test_lister.py b/swh/lister/gitlab/tests/test_lister.py
index 57bfccc..10144a7 100644
--- a/swh/lister/gitlab/tests/test_lister.py
+++ b/swh/lister/gitlab/tests/test_lister.py
@@ -1,346 +1,351 @@
# Copyright (C) 2017-2021 The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU General Public License version 3, or any later version
# See top-level LICENSE file for more information
import json
import logging
from pathlib import Path
from typing import Dict, List
import pytest
from requests.status_codes import codes
from swh.lister import USER_AGENT
from swh.lister.gitlab.lister import GitLabLister, _parse_id_after
from swh.lister.pattern import ListerStats
from swh.lister.tests.test_utils import assert_sleep_calls
from swh.lister.utils import WAIT_EXP_BASE
logger = logging.getLogger(__name__)
def api_url(instance: str) -> str:
return f"https://{instance}/api/v4/"
def _match_request(request):
return request.headers.get("User-Agent") == USER_AGENT
def test_lister_gitlab(datadir, swh_scheduler, requests_mock):
"""Gitlab lister supports full listing
"""
instance = "gitlab.com"
lister = GitLabLister(swh_scheduler, url=api_url(instance), instance=instance)
response = gitlab_page_response(datadir, instance, 1)
requests_mock.get(
lister.page_url(), [{"json": response}], additional_matcher=_match_request,
)
listed_result = lister.run()
expected_nb_origins = len(response)
assert listed_result == ListerStats(pages=1, origins=expected_nb_origins)
scheduler_origins = lister.scheduler.get_listed_origins(
lister.lister_obj.id
).results
assert len(scheduler_origins) == expected_nb_origins
for listed_origin in scheduler_origins:
assert listed_origin.visit_type == "git"
assert listed_origin.url.startswith(f"https://{instance}")
assert listed_origin.last_update is not None
def test_lister_gitlab_heptapod(datadir, swh_scheduler, requests_mock):
- """Gitlab lister ignores some vcs_type
+ """Heptapod lister happily lists hg, hg_git as hg and git origins
"""
+ name = "heptapod"
instance = "foss.heptapod.net"
- lister = GitLabLister(swh_scheduler, url=api_url(instance), instance=instance)
+ lister = GitLabLister(
+ swh_scheduler, url=api_url(instance), name=name, instance=instance
+ )
+ assert lister.LISTER_NAME == name
+
response = gitlab_page_response(datadir, instance, 1)
requests_mock.get(
lister.page_url(), [{"json": response}], additional_matcher=_match_request,
)
listed_result = lister.run()
expected_nb_origins = len(response)
for entry in response:
assert entry["vcs_type"] in ("hg", "hg_git")
assert listed_result == ListerStats(pages=1, origins=expected_nb_origins)
scheduler_origins = lister.scheduler.get_listed_origins(
lister.lister_obj.id
).results
assert len(scheduler_origins) == expected_nb_origins
for listed_origin in scheduler_origins:
assert listed_origin.visit_type == "hg"
assert listed_origin.url.startswith(f"https://{instance}")
assert listed_origin.last_update is not None
def gitlab_page_response(datadir, instance: str, id_after: int) -> List[Dict]:
"""Return list of repositories (out of test dataset)"""
datapath = Path(datadir, f"https_{instance}", f"api_response_page{id_after}.json")
return json.loads(datapath.read_text()) if datapath.exists else []
def test_lister_gitlab_with_pages(swh_scheduler, requests_mock, datadir):
"""Gitlab lister supports pagination
"""
instance = "gite.lirmm.fr"
lister = GitLabLister(swh_scheduler, url=api_url(instance))
response1 = gitlab_page_response(datadir, instance, 1)
response2 = gitlab_page_response(datadir, instance, 2)
requests_mock.get(
lister.page_url(),
[{"json": response1, "headers": {"Link": f"<{lister.page_url(2)}>; rel=next"}}],
additional_matcher=_match_request,
)
requests_mock.get(
lister.page_url(2), [{"json": response2}], additional_matcher=_match_request,
)
listed_result = lister.run()
expected_nb_origins = len(response1) + len(response2)
assert listed_result == ListerStats(pages=2, origins=expected_nb_origins)
scheduler_origins = lister.scheduler.get_listed_origins(
lister.lister_obj.id
).results
assert len(scheduler_origins) == expected_nb_origins
for listed_origin in scheduler_origins:
assert listed_origin.visit_type == "git"
assert listed_origin.url.startswith(f"https://{instance}")
assert listed_origin.last_update is not None
def test_lister_gitlab_incremental(swh_scheduler, requests_mock, datadir):
"""Gitlab lister supports incremental visits
"""
instance = "gite.lirmm.fr"
url = api_url(instance)
lister = GitLabLister(swh_scheduler, url=url, instance=instance, incremental=True)
url_page1 = lister.page_url()
response1 = gitlab_page_response(datadir, instance, 1)
url_page2 = lister.page_url(2)
response2 = gitlab_page_response(datadir, instance, 2)
url_page3 = lister.page_url(3)
response3 = gitlab_page_response(datadir, instance, 3)
requests_mock.get(
url_page1,
[{"json": response1, "headers": {"Link": f"<{url_page2}>; rel=next"}}],
additional_matcher=_match_request,
)
requests_mock.get(
url_page2, [{"json": response2}], additional_matcher=_match_request,
)
listed_result = lister.run()
expected_nb_origins = len(response1) + len(response2)
assert listed_result == ListerStats(pages=2, origins=expected_nb_origins)
assert lister.state.last_seen_next_link == url_page2
lister2 = GitLabLister(swh_scheduler, url=url, instance=instance, incremental=True)
# Lister will start back at the last stop
requests_mock.get(
url_page2,
[{"json": response2, "headers": {"Link": f"<{url_page3}>; rel=next"}}],
additional_matcher=_match_request,
)
requests_mock.get(
url_page3, [{"json": response3}], additional_matcher=_match_request,
)
listed_result2 = lister2.run()
assert listed_result2 == ListerStats(
pages=2, origins=len(response2) + len(response3)
)
assert lister2.state.last_seen_next_link == url_page3
assert lister.lister_obj.id == lister2.lister_obj.id
scheduler_origins = lister2.scheduler.get_listed_origins(
lister2.lister_obj.id
).results
assert len(scheduler_origins) == len(response1) + len(response2) + len(response3)
for listed_origin in scheduler_origins:
assert listed_origin.visit_type == "git"
assert listed_origin.url.startswith(f"https://{instance}")
assert listed_origin.last_update is not None
def test_lister_gitlab_rate_limit(swh_scheduler, requests_mock, datadir, mocker):
"""Gitlab lister supports rate-limit
"""
instance = "gite.lirmm.fr"
url = api_url(instance)
lister = GitLabLister(swh_scheduler, url=url, instance=instance)
url_page1 = lister.page_url()
response1 = gitlab_page_response(datadir, instance, 1)
url_page2 = lister.page_url(2)
response2 = gitlab_page_response(datadir, instance, 2)
requests_mock.get(
url_page1,
[{"json": response1, "headers": {"Link": f"<{url_page2}>; rel=next"}}],
additional_matcher=_match_request,
)
requests_mock.get(
url_page2,
[
# rate limited twice
{"status_code": codes.forbidden, "headers": {"RateLimit-Remaining": "0"}},
{"status_code": codes.forbidden, "headers": {"RateLimit-Remaining": "0"}},
# ok
{"json": response2},
],
additional_matcher=_match_request,
)
# To avoid this test being too slow, we mock sleep within the retry behavior
mock_sleep = mocker.patch.object(lister.get_page_result.retry, "sleep")
listed_result = lister.run()
expected_nb_origins = len(response1) + len(response2)
assert listed_result == ListerStats(pages=2, origins=expected_nb_origins)
assert_sleep_calls(mocker, mock_sleep, [1, WAIT_EXP_BASE])
@pytest.mark.parametrize("status_code", [502, 503, 520])
def test_lister_gitlab_http_errors(
swh_scheduler, requests_mock, datadir, mocker, status_code
):
"""Gitlab lister should retry requests when encountering HTTP 50x errors
"""
instance = "gite.lirmm.fr"
url = api_url(instance)
lister = GitLabLister(swh_scheduler, url=url, instance=instance)
url_page1 = lister.page_url()
response1 = gitlab_page_response(datadir, instance, 1)
url_page2 = lister.page_url(2)
response2 = gitlab_page_response(datadir, instance, 2)
requests_mock.get(
url_page1,
[{"json": response1, "headers": {"Link": f"<{url_page2}>; rel=next"}}],
additional_matcher=_match_request,
)
requests_mock.get(
url_page2,
[
# first request ends up with error
{"status_code": status_code},
# second request is ok
{"json": response2},
],
additional_matcher=_match_request,
)
# To avoid this test being too slow, we mock sleep within the retry behavior
mock_sleep = mocker.patch.object(lister.get_page_result.retry, "sleep")
listed_result = lister.run()
expected_nb_origins = len(response1) + len(response2)
assert listed_result == ListerStats(pages=2, origins=expected_nb_origins)
assert_sleep_calls(mocker, mock_sleep, [1])
def test_lister_gitlab_http_error_500(swh_scheduler, requests_mock, datadir):
"""Gitlab lister should skip buggy URL and move to next page.
"""
instance = "gite.lirmm.fr"
url = api_url(instance)
lister = GitLabLister(swh_scheduler, url=url, instance=instance)
url_page1 = lister.page_url()
response1 = gitlab_page_response(datadir, instance, 1)
url_page2 = lister.page_url(lister.per_page)
url_page3 = lister.page_url(2 * lister.per_page)
response3 = gitlab_page_response(datadir, instance, 3)
requests_mock.get(
url_page1,
[{"json": response1, "headers": {"Link": f"<{url_page2}>; rel=next"}}],
additional_matcher=_match_request,
)
requests_mock.get(
url_page2, [{"status_code": 500},], additional_matcher=_match_request,
)
requests_mock.get(
url_page3, [{"json": response3}], additional_matcher=_match_request,
)
listed_result = lister.run()
expected_nb_origins = len(response1) + len(response3)
assert listed_result == ListerStats(pages=2, origins=expected_nb_origins)
def test_lister_gitlab_credentials(swh_scheduler):
"""Gitlab lister supports credentials configuration
"""
instance = "gitlab"
credentials = {
"gitlab": {instance: [{"username": "user", "password": "api-token"}]}
}
url = api_url(instance)
lister = GitLabLister(
scheduler=swh_scheduler, url=url, instance=instance, credentials=credentials
)
assert lister.session.headers["Authorization"] == "Bearer api-token"
@pytest.mark.parametrize("url", [api_url("gitlab").rstrip("/"), api_url("gitlab"),])
def test_lister_gitlab_url_computation(url, swh_scheduler):
lister = GitLabLister(scheduler=swh_scheduler, url=url)
assert not lister.url.endswith("/")
page_url = lister.page_url()
# ensure the generated url contains the separated /
assert page_url.startswith(f"{lister.url}/projects")
@pytest.mark.parametrize(
"url,expected_result",
[
(None, None),
("http://dummy/?query=1", None),
("http://dummy/?foo=bar&id_after=1&some=result", 1),
("http://dummy/?foo=bar&id_after=&some=result", None),
],
)
def test__parse_id_after(url, expected_result):
assert _parse_id_after(url) == expected_result
File Metadata
Details
Attached
Mime Type
text/x-diff
Expires
Mon, Aug 18, 11:12 PM (6 d, 16 h ago)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3245628
Attached To
rDLS Listers
Event Timeline
Log In to Comment