Page MenuHomeSoftware Heritage

lister: Add utility module to ease HTTP requests rate limit handling
ClosedPublic

Authored by anlambert on Jan 15 2021, 12:42 PM.

Details

Summary

Add swh.lister.utils.throttling_retry decorator enabling to retry a
function that performs an HTTP request who can return a 429 status code.

The implementation is based on the tenacity module and it is assumed
that the requests library is used when querying an URL.

The default wait strategy is based on exponential backoff.

The default max number of attempts is set to 5, HTTPError exception
will then be reraised.

All tenacity.retry parameters can also be overridden in client code.

I also ensured that introduced code can be executed and tested with the
tenacity version packaged in debian buster (4.12).

Example of use:

import requests

from swh.lister.utils import throttling_retry

URL = "http://example.org/api/repositories"

@throttling_retry()
def make_request():
    response = requests.get(URL)
    response.raise_for_status()
    return response

Diff Detail

Repository
rDLS Listers
Branch
lister-request-throttling
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 18436
Build 28503: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 28502: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D4869 (id=17252)

Rebasing onto c782275296...

Current branch diff-target is up to date.
Changes applied before test
commit f81482a62bbcbe2dddc029b18c2ff9c5d69623ac
Author: Antoine Lambert <antoine.lambert@inria.fr>
Date:   Fri Jan 15 12:25:45 2021 +0100

    lister: Add utility module to ease HTTP requests rate limit handling
    
    Add throttling_retry module providing a decorator enabling to retry a
    function that performs an HTTP request who can return a 429 status code.
    
    The implementation is based on the tenacity module and it is assumed
    that the requests library is used when querying an URL.
    
    The default wait strategy is based on exponential backoff.
    
    The default max number of attempts is set to 5, HTTPError exception
    will then be reraised.
    
    All tenacity.retry parameters can also be overridden in client code.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/101/ for more details.

Update: Use requests.status_code.codes in tests

Build is green

Patch application report for D4869 (id=17265)

Rebasing onto c782275296...

Current branch diff-target is up to date.
Changes applied before test
commit 7b70dbd4cd9620cf566f03fd10b7a8805a5df85f
Author: Antoine Lambert <antoine.lambert@inria.fr>
Date:   Fri Jan 15 12:25:45 2021 +0100

    lister: Add utility module to ease HTTP requests rate limit handling
    
    Add throttling_retry module providing a decorator enabling to retry a
    function that performs an HTTP request who can return a 429 status code.
    
    The implementation is based on the tenacity module and it is assumed
    that the requests library is used when querying an URL.
    
    The default wait strategy is based on exponential backoff.
    
    The default max number of attempts is set to 5, HTTPError exception
    will then be reraised.
    
    All tenacity.retry parameters can also be overridden in client code.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/103/ for more details.

tenma added a subscriber: tenma.

It would be great to parametrize the request and error type, but unfortunately it does not seem easy to do...

TGTM otherwise.

This revision is now accepted and ready to land.Jan 15 2021, 2:42 PM
In D4869#121891, @tenma wrote:

It would be great to parametrize the request and error type, but unfortunately it does not seem easy to do...

TGTM otherwise.

Actually, we can. For instance, the retry predicate for the github lister could be written and used like this:

from swh.lister.utils import is_throttling_exception, retry_attempt, throttling_retry

def github_is_anomymous_throttling(e: Exception) -> bool:
    return (
        isinstance(e, HTTPError) and e.response.status_code == 403
        and "Authorization" not in e.response.request.headers
    )

def github_retry_if_throttling(retry_state) -> bool:
    attempt = retry_attempt(retry_state)
    if attempt.failed:
        exception = attempt.exception()
        return is_throttling_exception(exception) or github_is_anomymous_throttling(exception)
    return False

@throttling_retry(retry=github_retry_if_throttling)
def make_github_request():
    ...

Update:

  • move code to swh.lister.utils module
  • Improve doctring
  • Add tenacity in requirements.txt

Build is green

Patch application report for D4869 (id=17296)

Rebasing onto c782275296...

Current branch diff-target is up to date.
Changes applied before test
commit 067553e30cb2bc705de0f07d048fb9c1ca7b90be
Author: Antoine Lambert <antoine.lambert@inria.fr>
Date:   Fri Jan 15 12:25:45 2021 +0100

    lister: Add utility module to ease HTTP requests rate limit handling
    
    Add throttling_retry module providing a decorator enabling to retry a
    function that performs an HTTP request who can return a 429 status code.
    
    The implementation is based on the tenacity module and it is assumed
    that the requests library is used when querying an URL.
    
    The default wait strategy is based on exponential backoff.
    
    The default max number of attempts is set to 5, HTTPError exception
    will then be reraised.
    
    All tenacity.retry parameters can also be overridden in client code.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/110/ for more details.

Build is green

Patch application report for D4869 (id=17297)

Rebasing onto c782275296...

Current branch diff-target is up to date.
Changes applied before test
commit d2a02bc07822b39742464626fe15456967b3165e
Author: Antoine Lambert <antoine.lambert@inria.fr>
Date:   Fri Jan 15 12:25:45 2021 +0100

    lister: Add utility decorator to ease HTTP requests rate limit handling
    
    Add swh.lister.utils.throttling_retry decorator enabling to retry a
    function that performs an HTTP request who can return a 429 status code.
    
    The implementation is based on the tenacity module and it is assumed
    that the requests library is used when querying an URL.
    
    The default wait strategy is based on exponential backoff.
    
    The default max number of attempts is set to 5, HTTPError exception
    will then be reraised.
    
    All tenacity.retry parameters can also be overridden in client code.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/111/ for more details.