Page MenuHomeSoftware Heritage

Return HTTP 503 on AioRpcError
ClosedPublic

Authored by vlorentz on Sep 6 2022, 5:15 PM.

Details

Summary

AioRpcError exceptions used to trigger the default error handler
of SWH's RPC framework, which msgpacks the exception.

Under some circumstances, this caused the msgpack to hit
recursion limits, and logged very large errors to Sentry.

Additionally, SimpleTraversalView now calls wait_for_connection()
before returning 200; which allows catching some gRPC errors early
and returning 503 instead.

This does not work all the time though; gRPC errors occuring in the
middle of the stream still raise ChunkedEncodingError on the client
side, but there is not much we can do about that.

Depends on D8403.

Resolves the issue that caused this instance of T4497.

Diff Detail

Repository
rDGRPH Compressed graph representation
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D8404 (id=30324)

Could not rebase; Attempt merge onto dd88f177c3...

Updating dd88f17..02b6a6f
Fast-forward
 swh/graph/http_server.py                 | 31 ++++++++++++++++---
 swh/graph/tests/conftest.py              | 53 ++++++++++++++++++++------------
 swh/graph/tests/test_http_server_down.py | 38 +++++++++++++++++++++++
 3 files changed, 99 insertions(+), 23 deletions(-)
 create mode 100644 swh/graph/tests/test_http_server_down.py
Changes applied before test
commit 02b6a6f16cf5db60a6439c67aa076cabb2cd5af2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 6 17:10:45 2022 +0200

    Return HTTP 503 on AioRpcError
    
    AioRpcError exceptions used to trigger the default error handler
    of SWH's RPC framework, which msgpacks the exception.
    
    Under some circumstances, this caused the msgpack to hit
    recursion limits, and logged very large errors to Sentry.
    
    Additionally, SimpleTraversalView now calls wait_for_connection()
    before returning 200; which allows catching some gRPC errors early
    and returning 503 instead.
    
    This does not work all the time though; gRPC errors occuring in the
    middle of the stream still raise `ChunkedEncodingError` on the client
    side, but there is not much we can do about that.

commit f2ea86ec523c7ee58301718302dd275034dcf54a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 6 17:06:39 2022 +0200

    Kill GraphServerProcess on test teardown
    
    There is no reason to terminate it nicely.

commit 2104133e41a924cf7d2678ac829b5bee44ba2138
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 6 17:02:22 2022 +0200

    conftest: Refactor GraphServerProcess to be more flexible
    
    1. queue is now encapsulated
    2. object in the queue is now a dict instead of a tuple (to allow
       adding more keys easily in the future + for readability)
    
    This will be used in a future commit, to get the queue's object
    twice from different functions.

See https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/220/ for more details.

Build is green

Patch application report for D8404 (id=30327)

Could not rebase; Attempt merge onto dd88f177c3...

Updating dd88f17..75a8b8d
Fast-forward
 swh/graph/http_server.py                 | 31 ++++++++++++++++---
 swh/graph/tests/conftest.py              | 53 ++++++++++++++++++++------------
 swh/graph/tests/test_http_server_down.py | 38 +++++++++++++++++++++++
 3 files changed, 99 insertions(+), 23 deletions(-)
 create mode 100644 swh/graph/tests/test_http_server_down.py
Changes applied before test
commit 75a8b8d312beaaae9c8ed6a3d929965350195dfd
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 6 17:10:45 2022 +0200

    Return HTTP 503 on AioRpcError
    
    AioRpcError exceptions used to trigger the default error handler
    of SWH's RPC framework, which msgpacks the exception.
    
    Under some circumstances, this caused the msgpack to hit
    recursion limits, and logged very large errors to Sentry.
    
    Additionally, SimpleTraversalView now calls wait_for_connection()
    before returning 200; which allows catching some gRPC errors early
    and returning 503 instead.
    
    This does not work all the time though; gRPC errors occuring in the
    middle of the stream still raise `ChunkedEncodingError` on the client
    side, but there is not much we can do about that.

commit 8489c14e20ae3e4a57f3216c282a894cfff236bb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 6 17:06:39 2022 +0200

    Kill GraphServerProcess on test teardown
    
    There is no reason to terminate it nicely.

commit 46255335308446d7b077099fd9b1c68726faf140
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 6 17:02:22 2022 +0200

    conftest: Refactor GraphServerProcess to be more flexible
    
    1. queue is now encapsulated
    2. object in the queue is now a dict instead of a tuple (to allow
       adding more keys easily in the future + for readability)
    
    This will be used in a future commit, to get the queue's object
    twice from different functions.

See https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/223/ for more details.

Build is green

Patch application report for D8404 (id=30333)

Could not rebase; Attempt merge onto dd88f177c3...

Updating dd88f17..fcca878
Fast-forward
 swh/graph/http_server.py                 | 31 ++++++++++++++++---
 swh/graph/tests/conftest.py              | 53 ++++++++++++++++++++------------
 swh/graph/tests/test_http_server_down.py | 38 +++++++++++++++++++++++
 3 files changed, 99 insertions(+), 23 deletions(-)
 create mode 100644 swh/graph/tests/test_http_server_down.py
Changes applied before test
commit fcca8787491312028d6ace30cfb7a76c23d7dfa9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 6 17:10:45 2022 +0200

    Return HTTP 503 on AioRpcError
    
    AioRpcError exceptions used to trigger the default error handler
    of SWH's RPC framework, which msgpacks the exception.
    
    Under some circumstances, this caused the msgpack to hit
    recursion limits, and logged very large errors to Sentry.
    
    Additionally, SimpleTraversalView now calls wait_for_connection()
    before returning 200; which allows catching some gRPC errors early
    and returning 503 instead.
    
    This does not work all the time though; gRPC errors occuring in the
    middle of the stream still raise `ChunkedEncodingError` on the client
    side, but there is not much we can do about that.

commit 4a7f3f7650a4dd9e29b26b072b019e20115b72b1
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 6 17:06:39 2022 +0200

    Kill GraphServerProcess on test teardown
    
    There is no reason to terminate it nicely.

commit 81d3deb2f1c1b584a076484250cb6f349009f032
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 6 17:02:22 2022 +0200

    conftest: Refactor GraphServerProcess to be more flexible
    
    1. queue is now encapsulated
    2. object in the queue is now a dict instead of a tuple (to allow
       adding more keys easily in the future + for readability)
    
    This will be used in a future commit, to get the queue's object
    twice from different functions.

See https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/226/ for more details.

olasd added a subscriber: olasd.

Thanks for looking into this!

This revision is now accepted and ready to land.Sep 6 2022, 5:41 PM
This revision was automatically updated to reflect the committed changes.