Page MenuHomeSoftware Heritage

Production-readiness improvements for the GitHub lister
ClosedPublic

Authored by olasd on Feb 25 2021, 9:33 PM.

Details

Summary
  • GitHub: Use function for requests.Session initialization
  • GitHub: Start moving the request logic to a separate function
  • GitHub: Move rate limit handling to the request function
  • Retry GitHub requests on ChunkEncodingErrors
  • GitHub: Move rate-limit reset logic to RateLimited exception
  • GitHub: handle Server Errors
  • GitHub: handle edge cases with empty responses
  • GitHub: more verbose logging on unexpected responses
Test Plan

tested in docker, needs unit tests

Diff Detail

Repository
rDLS Listers
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build has FAILED

Patch application report for D5152 (id=18425)

Rebasing onto 5b4dc289b7...

Current branch diff-target is up to date.
Changes applied before test
commit 678f479a37009f22e708752f734faa21843744c8
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Feb 25 21:15:35 2021 +0100

    GitHub: more verbose logging on unexpected responses

commit 9332b52452cb78cf86418bc1b660bbdbc124e76e
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Feb 25 21:14:45 2021 +0100

    GitHub: handle edge cases with empty responses

commit 2ef27cc4fec127882fc07e4e3119ab5be47770e6
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Feb 25 21:13:27 2021 +0100

    GitHub: handle Server Errors
    
    These errors happen, sometimes, when requesting large pages of results.

commit 10392c4646c29bfd12765ff6ef028f358ed349da
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Feb 25 21:11:29 2021 +0100

    GitHub: Move rate-limit reset logic to RateLimited exception
    
    This makes the logic easier to test.

commit 116753d3341cd7061bf0aa0e9839f731752d5f90
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Feb 25 21:09:53 2021 +0100

    Retry GitHub requests on ChunkEncodingErrors
    
    These happen, sometimes, when the connection to the GitHub server
    resets, e.g. because of congestion on a slow link.

commit 52cefd535f23ed07060e29942395ff72c3630599
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Feb 18 12:50:53 2021 +0100

    GitHub: Move rate limit handling to the request function

commit 74485afdffa4acf60ccea75f237cf16b2ae6941f
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Feb 5 15:22:10 2021 +0100

    GitHub: Start moving the request logic to a separate function

commit 6f7cd1154efcc5987e7ebb336ff84e56568ad345
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Feb 5 15:12:22 2021 +0100

    GitHub: Use function for requests.Session initialization
    
    This will help us to break the retry logic for the listing requests
    themselves to a separate function too.

Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/252/
See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/252/console

Harbormaster returned this revision to the author for changes because remote builds failed.Feb 25 2021, 9:35 PM
Harbormaster failed remote builds in B19504: Diff 18425!

Fix wrong logging statement breaking tests

Build is green

Patch application report for D5152 (id=18939)

Rebasing onto df73073a67...

Current branch diff-target is up to date.
Changes applied before test
commit 248f27d368b64dcbd4a40dd81f95b44e934ed259
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Feb 25 21:15:35 2021 +0100

    GitHub: more verbose logging on unexpected responses

commit a290f986c3e4e6cdfcb9fdda41e56b5f19adbc44
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Feb 25 21:14:45 2021 +0100

    GitHub: handle edge cases with empty responses

commit 624e18405671f52d628f7f3a0c0f646b543e6c8d
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Feb 25 21:13:27 2021 +0100

    GitHub: handle Server Errors
    
    These errors happen, sometimes, when requesting large pages of results.

commit c414ce706a2825b4c42f3c2837a8b136498fe9e8
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Feb 25 21:11:29 2021 +0100

    GitHub: Move rate-limit reset logic to RateLimited exception
    
    This makes the logic easier to test.

commit 521bfd6abc6aaa67ed6c661afbbdbe10dec1fa3b
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Feb 25 21:09:53 2021 +0100

    Retry GitHub requests on ChunkEncodingErrors
    
    These happen, sometimes, when the connection to the GitHub server
    resets, e.g. because of congestion on a slow link.

commit 61c1d444c5722a54e8caf193bdc553a9f2bf3eb5
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Feb 18 12:50:53 2021 +0100

    GitHub: Move rate limit handling to the request function

commit 03b10e5c834148da911e21c01842448d3b0f3c8b
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Feb 5 15:22:10 2021 +0100

    GitHub: Start moving the request logic to a separate function

commit 8f7dbb7488e64dd90532814ef78fc59fc4d0437f
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Feb 5 15:12:22 2021 +0100

    GitHub: Use function for requests.Session initialization
    
    This will help us to break the retry logic for the listing requests
    themselves to a separate function too.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/256/ for more details.

olasd requested review of this revision.Mar 19 2021, 4:08 PM
vlorentz added inline comments.
swh/lister/github/lister.py
227

no increasing backoff?

swh/lister/github/lister.py
227

I've only noticed this once in a while. I think it makes more sense to move this retry to the github_request function, which will give it a backoff and limited number of tries automatically.

Move retries on 502 errors to the github_request wrapper

Build is green

Patch application report for D5152 (id=18941)

Rebasing onto df73073a67...

Current branch diff-target is up to date.
Changes applied before test
commit 879170a57d17d3549843cfb2d280c495a44d6ab2
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Feb 25 21:14:45 2021 +0100

    GitHub: handle edge cases with empty responses

commit c375a61b166845c2e55b841222d169b724e2e2e0
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Feb 25 21:13:27 2021 +0100

    GitHub: handle Server Errors
    
    These errors happen, sometimes, when requesting large pages of results.

commit 4a215e68e08d0da3f7523ea3570e7b4c523ba362
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Feb 25 21:11:29 2021 +0100

    GitHub: Move rate-limit reset logic to RateLimited exception
    
    This makes the logic easier to test.

commit cfd4169bd8c68d2a6fbd3287d81c694d07b4858e
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Feb 25 21:09:53 2021 +0100

    Retry GitHub requests on ChunkEncodingErrors
    
    These happen, sometimes, when the connection to the GitHub server
    resets, e.g. because of congestion on a slow link.

commit 61c1d444c5722a54e8caf193bdc553a9f2bf3eb5
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Feb 18 12:50:53 2021 +0100

    GitHub: Move rate limit handling to the request function

commit 03b10e5c834148da911e21c01842448d3b0f3c8b
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Feb 5 15:22:10 2021 +0100

    GitHub: Start moving the request logic to a separate function

commit 8f7dbb7488e64dd90532814ef78fc59fc4d0437f
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Feb 5 15:12:22 2021 +0100

    GitHub: Use function for requests.Session initialization
    
    This will help us to break the retry logic for the listing requests
    themselves to a separate function too.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/257/ for more details.

ardumont added a subscriber: ardumont.

lgtm

thanks.

This revision is now accepted and ready to land.Mar 19 2021, 8:08 PM