Page MenuHomeSoftware Heritage

celery: acknowledge tasks as soon as they're received
ClosedPublic

Authored by olasd on Feb 3 2021, 8:11 PM.

Details

Summary

With late acknowledgements, RabbitMQ will re-send tasks to clients even
if they can't ever complete the task (e.g. when the task gets killed
because the machine is out of memory).

This problem only increases over time, leading to complete starvation of
the ingestion system.

Now that we have multiple mechanisms to issue retries of tasks, we can
use early acknowledgements for tasks instead, which should mitigate the
ongoing starvation, at the expense of having to retry tasks externally.

(tangentially) Related to T3025.

Diff Detail

Repository
rDSCH Scheduling utilities
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D5003 (id=17864)

Rebasing onto aaffff2631...

Current branch diff-target is up to date.
Changes applied before test
commit 14feab9523804dd8b18acab29632a38cabe3e2d9
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Wed Feb 3 19:51:30 2021 +0100

    celery: acknowledge tasks as soon as they're received
    
    With late acknowledgements, RabbitMQ will re-send tasks to clients even
    if they can't ever complete the task (e.g. when the task gets killed
    because the machine is out of memory).
    
    This problem only increases over time, leading to complete starvation of
    the ingestion system.
    
    Now that we have multiple mechanisms to issue retries of tasks, we can
    use early acknowledgements for tasks instead, which should mitigate the
    ongoing starvation, at the expense of having to retry tasks externally.

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/296/ for more details.

olasd requested review of this revision.Feb 3 2021, 8:14 PM

So I've hotpatched this in production, and it seems to have cleared the current blockage, which is nice.

This revision is now accepted and ready to land.Feb 3 2021, 10:17 PM