Page MenuHomeSoftware Heritage

retry: Add constant 10s wait when retrying transient exceptions
ClosedPublic

Authored by vlorentz on Aug 9 2022, 3:42 PM.

Details

Summary

They are typically caused by server shutdown and other temporary
failures that may take more time than the typical 0-3s delay
used by the retry proxy.

This should keep noisy exceptions like AdminShutdown out of the
Sentry dashboards.

Depends on D8223.

Diff Detail

Repository
rDSTO Storage manager
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 30728
Build 48046: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 48045: arc lint + arc unit

Event Timeline

Build has FAILED

Patch application report for D8224 (id=29669)

Could not rebase; Attempt merge onto b4f289c8e8...

Updating b4f289c8..5335244f
Fast-forward
 requirements-swh.txt                 |  2 +-
 swh/storage/api/server.py            | 18 +++++++++++---
 swh/storage/proxies/retry.py         | 25 ++++++++++++++++---
 swh/storage/tests/test_api_client.py | 44 ++++++++++++++++++++++++++++++++-
 swh/storage/tests/test_retry.py      | 48 ++++++++++++++++++++++++++++++++++++
 5 files changed, 128 insertions(+), 9 deletions(-)
Changes applied before test
commit 5335244fc187b5323ddd4f5dea223f96c782f64f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 9 15:42:03 2022 +0200

    retry: Add constant 10s wait when retrying transient exceptions
    
    They are typically caused by server shutdown and other temporary
    failures that may take more time than the typical 0-3s delay
    used by the retry proxy.
    
    This should keep noisy exceptions like AdminShutdown out of the
    Sentry dashboards.

commit 7c7a721da2ae1dd4fd71b0d32e8b8cbaddbaa421
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 9 15:38:15 2022 +0200

    Convert psycopg2 errors to TransientRemoteException instead of RemoteException
    
    On the wire, this is done by making the server return a 503 error
    instead of 500, which the RPC client generated by swh-core
    interprets to change the exception class.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1650/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1650/console

Harbormaster returned this revision to the author for changes because remote builds failed.Aug 9 2022, 3:43 PM
Harbormaster failed remote builds in B30728: Diff 29669!
olasd added a subscriber: olasd.

That seems like a plausible change, thanks!

This revision is now accepted and ready to land.Aug 16 2022, 12:46 PM

Build is green

Patch application report for D8224 (id=29669)

Could not rebase; Attempt merge onto b4f289c8e8...

Updating b4f289c8..5335244f
Fast-forward
 requirements-swh.txt                 |  2 +-
 swh/storage/api/server.py            | 18 +++++++++++---
 swh/storage/proxies/retry.py         | 25 ++++++++++++++++---
 swh/storage/tests/test_api_client.py | 44 ++++++++++++++++++++++++++++++++-
 swh/storage/tests/test_retry.py      | 48 ++++++++++++++++++++++++++++++++++++
 5 files changed, 128 insertions(+), 9 deletions(-)
Changes applied before test
commit 5335244fc187b5323ddd4f5dea223f96c782f64f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 9 15:42:03 2022 +0200

    retry: Add constant 10s wait when retrying transient exceptions
    
    They are typically caused by server shutdown and other temporary
    failures that may take more time than the typical 0-3s delay
    used by the retry proxy.
    
    This should keep noisy exceptions like AdminShutdown out of the
    Sentry dashboards.

commit 7c7a721da2ae1dd4fd71b0d32e8b8cbaddbaa421
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 9 15:38:15 2022 +0200

    Convert psycopg2 errors to TransientRemoteException instead of RemoteException
    
    On the wire, this is done by making the server return a 503 error
    instead of 500, which the RPC client generated by swh-core
    interprets to change the exception class.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1652/ for more details.