Page MenuHomeSoftware Heritage

Add "queue shaping" to the scheduler
Closed, ResolvedPublic

Description

Currently, the scheduler just sends all the ready tasks to rabbitmq as fast as possible.

This puts a lot of pressure on the rabbitmq server for stuff that is already reliably present in the scheduler database, for no real good reason.

It would make sense to move back that pressure to the database where it belongs.

This would give us a manyfold improvement :

  • constraining the size of the rabbitmq queues, reducing memory usage and making rabbitmq only as a transient, volatile queue
  • constraining the number of active entries in the task_run table, which will help with garbage collecting in the long run (as we will only have a small number of "in-flight" tasks)
  • reducing the number of moving parts we need to monitor : we can just look at the database for the backlog.

To implement this, we should add some sort of a per-task-type scheduled task run limit.