HomeSoftware Heritage

Add a new cli endpoint to schedule recurrent visits in Celery

This commit no longer exists in the repository. It may have been part of a branch which was deleted.

Description

Add a new cli endpoint to schedule recurrent visits in Celery

For each known visit type, we run a loop which:

  • monitors the size of the relevant celery queue
  • schedules more visits of the relevant type once the number of available slots goes over a given threshold (currently set to 5% of the max queue size).

The scheduling of visits combines multiple scheduling policies, for now
using static ratios set in the POLICY_RATIOS dict. We emit a warning
if the ratio of origins fetched for each policy is skewed with respect
to the original request (allowing, for now, manual adjustement of the
ratios).

The CLI endpoint spawns one thread for each visit type, which all handle
connections to RabbitMQ and the scheduler backend separately. For now,
we handle exceptions in the visit scheduling threads by (stupidly)
respawning the relevant thread directly. We should probably improve this
to give up after a specific number of tries.

Co-authored-by: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>

Details

Provenance
olasdAuthored on Oct 27 2021, 12:09 PM
ardumontCommitted on Oct 28 2021, 1:06 PM
ardumontPushed on Oct 28 2021, 1:10 PM
Differential Revision
D6520: Add a new cli endpoint to schedule recurrent visits in Celery
Tasks
T3667: Orchestrate origins scheduling according to scheduler metrics feedback
Build Status
Buildable 24778
Build 38687: test-and-buildJenkins console · Jenkins

Commit No Longer Exists

This commit no longer exists in the repository.