Page MenuHomeSoftware Heritage

Refactor and deduplicate HTTP requests code in listers
ClosedPublic

Authored by anlambert on Sep 21 2022, 8:09 PM.

Details

Summary

Numerous listers were using the same page_request method or equivalent
in their implementation so prefer to deduplicate that code by adding
an http_request method in base lister class: swh.lister.pattern.Lister.

That method simply wraps a call to requests.Session.request and logs
some useful info for debugging and error reporting, also an HTTPError
will be raised if a request ends up with an error.

All listers using that new method now benefit of requests retry when
an HTTP error occurs thanks to the use of the http_retry decorator.

Depends on D8519

Diff Detail

Repository
rDLS Listers
Branch
http-requests-refactoring
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 31717
Build 49628: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 49627: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D8520 (id=30682)

Could not rebase; Attempt merge onto bd35d54398...

Updating bd35d54..d677988
Fast-forward
 docs/new_lister_template.py                 | 41 ++---------------------
 docs/tutorial.rst                           |  2 +-
 swh/lister/__init__.py                      |  7 ++--
 swh/lister/arch/lister.py                   | 31 ++---------------
 swh/lister/aur/lister.py                    |  4 +--
 swh/lister/bitbucket/lister.py              | 33 +++---------------
 swh/lister/bitbucket/tests/test_lister.py   | 10 +++---
 swh/lister/bower/lister.py                  | 35 +++----------------
 swh/lister/cgit/lister.py                   | 15 ++-------
 swh/lister/cgit/tests/test_lister.py        |  4 +--
 swh/lister/debian/lister.py                 | 17 ++++------
 swh/lister/gitea/tests/test_lister.py       |  2 +-
 swh/lister/gitlab/lister.py                 | 13 +++-----
 swh/lister/gogs/lister.py                   | 52 +++++++++++------------------
 swh/lister/gogs/tests/test_lister.py        |  8 ++---
 swh/lister/golang/lister.py                 | 28 ++--------------
 swh/lister/golang/tests/test_lister.py      |  9 ++---
 swh/lister/launchpad/lister.py              |  4 +--
 swh/lister/maven/lister.py                  | 52 +++++++++--------------------
 swh/lister/maven/tests/test_lister.py       |  5 +++
 swh/lister/npm/lister.py                    | 30 +++--------------
 swh/lister/npm/tests/test_lister.py         | 25 ++++++++++----
 swh/lister/packagist/lister.py              | 25 +++-----------
 swh/lister/pattern.py                       | 32 +++++++++++++++++-
 swh/lister/phabricator/lister.py            | 28 +++-------------
 swh/lister/phabricator/tests/test_lister.py |  7 +++-
 swh/lister/pubdev/lister.py                 | 31 ++++-------------
 swh/lister/pypi/lister.py                   |  6 ++--
 swh/lister/sourceforge/lister.py            | 43 +++++-------------------
 swh/lister/sourceforge/tests/test_lister.py |  9 ++---
 swh/lister/tests/test_utils.py              | 32 ++++++++++--------
 swh/lister/tuleap/lister.py                 | 39 ++++------------------
 swh/lister/tuleap/tests/test_lister.py      |  7 +++-
 swh/lister/utils.py                         | 14 ++------
 34 files changed, 217 insertions(+), 483 deletions(-)
Changes applied before test
commit d677988e22f0d71d6b4aa2dae24f2d41f2ce337a
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Sep 21 19:53:22 2022 +0200

    Refactor and deduplicate HTTP requests code in listers
    
    Numerous listers were using the same page_request method or equivalent
    in their implementation so prefer to deduplicate that code by adding
    an http_request method in base lister class: swh.lister.pattern.Lister.
    
    That method simply wraps a call to requests.Session.request and logs
    some useful info for debugging and error reporting, also an HTTPError
    will be raised if a request ends up with an error.
    
    All listers using that new method now benefit of requests retry when
    an HTTP error occurs thanks to the use of the http_retry decorator.

commit 6284d34b1725ce9c5e9f76359a2760da871edb7b
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Sep 21 16:56:34 2022 +0200

    Use generic HTTP retry policy by default and rename dedicated decorator
    
    Instead of retrying HTTP requests only for 429 status code by default,
    prefer to use the generic retry policy enabling to also retry for status
    codes >= 500 but also on ConnectionError exceptions.
    
    Rename throttling_retry decorator to http_retry to reflect this change.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/672/ for more details.

This revision is now accepted and ready to land.Sep 22 2022, 11:43 AM

Build is green

Patch application report for D8520 (id=30740)

Could not rebase; Attempt merge onto 9b3e565cf7...

Updating 9b3e565..db6ce12
Fast-forward
 docs/new_lister_template.py                 | 41 ++---------------------
 docs/tutorial.rst                           |  2 +-
 swh/lister/__init__.py                      |  7 ++--
 swh/lister/arch/lister.py                   | 31 ++---------------
 swh/lister/aur/lister.py                    |  4 +--
 swh/lister/bitbucket/lister.py              | 33 +++---------------
 swh/lister/bitbucket/tests/test_lister.py   | 10 +++---
 swh/lister/bower/lister.py                  | 35 +++----------------
 swh/lister/cgit/lister.py                   | 15 ++-------
 swh/lister/cgit/tests/test_lister.py        |  4 +--
 swh/lister/debian/lister.py                 | 17 ++++------
 swh/lister/gitea/tests/test_lister.py       |  2 +-
 swh/lister/gitlab/lister.py                 | 13 +++-----
 swh/lister/gogs/lister.py                   | 52 +++++++++++------------------
 swh/lister/gogs/tests/test_lister.py        |  8 ++---
 swh/lister/golang/lister.py                 | 28 ++--------------
 swh/lister/golang/tests/test_lister.py      |  9 ++---
 swh/lister/launchpad/lister.py              |  4 +--
 swh/lister/maven/lister.py                  | 52 +++++++++--------------------
 swh/lister/maven/tests/test_lister.py       |  5 +++
 swh/lister/npm/lister.py                    | 30 +++--------------
 swh/lister/npm/tests/test_lister.py         | 25 ++++++++++----
 swh/lister/packagist/lister.py              | 25 +++-----------
 swh/lister/pattern.py                       | 32 +++++++++++++++++-
 swh/lister/phabricator/lister.py            | 28 +++-------------
 swh/lister/phabricator/tests/test_lister.py |  7 +++-
 swh/lister/pubdev/lister.py                 | 31 ++++-------------
 swh/lister/pypi/lister.py                   |  6 ++--
 swh/lister/sourceforge/lister.py            | 43 +++++-------------------
 swh/lister/sourceforge/tests/test_lister.py |  9 ++---
 swh/lister/tests/test_utils.py              | 32 ++++++++++--------
 swh/lister/tuleap/lister.py                 | 39 ++++------------------
 swh/lister/tuleap/tests/test_lister.py      |  7 +++-
 swh/lister/utils.py                         | 14 ++------
 34 files changed, 217 insertions(+), 483 deletions(-)
Changes applied before test
commit db6ce12e9e3bf527e81a3c8fb3b159d81f6a7d0f
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Sep 21 19:53:22 2022 +0200

    Refactor and deduplicate HTTP requests code in listers
    
    Numerous listers were using the same page_request method or equivalent
    in their implementation so prefer to deduplicate that code by adding
    an http_request method in base lister class: swh.lister.pattern.Lister.
    
    That method simply wraps a call to requests.Session.request and logs
    some useful info for debugging and error reporting, also an HTTPError
    will be raised if a request ends up with an error.
    
    All listers using that new method now benefit of requests retry when
    an HTTP error occurs thanks to the use of the http_retry decorator.

commit 9c55acd286091acb6f6094e9fe1c95aca1fdeeec
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Sep 21 16:56:34 2022 +0200

    Use generic HTTP retry policy by default and rename dedicated decorator
    
    Instead of retrying HTTP requests only for 429 status code by default,
    prefer to use the generic retry policy enabling to also retry for status
    codes >= 500 but also on ConnectionError exceptions.
    
    Rename throttling_retry decorator to http_retry to reflect this change.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/685/ for more details.