Page MenuHomeSoftware Heritage

(Re)Scheduling oneshot tasks automatically
Closed, MigratedEdits Locked

Description

As of now, the oneshot tasks are simply some form of sending messages, through a producer, to the appropriate queue.
As the oneshot tasks volume keep increasing (googlecode backup ingestion, gitorious backup ingestion, rehash, ...), and those tasks, like any other, can fail, this becomes important to permit some form of scheduling and rescheduling in case of failure.

Note:

  • Possibly investigate the scheduler to improve this situation if that makes sense.
  • There exists a branch oneshot-task in the swh-scheduler repository about this.

Event Timeline

ardumont triaged this task as Normal priority.May 29 2017, 10:52 AM
ardumont created this task.

I've taken a look at the oneshot-task branch in the scheduler repo.

At first I thought that the implemented approach, which duplicates the tables, would be the way to go, but after the whiteboarding session with @zack, I have now reconsidered. I think we can manage to implement this with simple adaptations to the current model (which is great because it avoids changing the "runner" and the "listener" component too much).

Here's my plan:

  • Update task_type with the fields for one-shot tasks:
    • add a max_tries column for the max number of tries for one-shot tasks
    • add a min_delay column for the minimum delay we want to wait before a retry
  • Update task with the fields for one-shot tasks:
    • add a oneshot field to prevent periodic recursion
    • add a num_tries field to keep the current number of retries
  • Update the trigger on task_run
    • if the task is not oneshot, keep doing what we currently do
    • if the task is oneshot and failed, move it to the back of the queue, by incrementing its number of tries and setting its next_run field accordingly.
olasd claimed this task.

Well, this has been implemented for a while now; Don't know why the task wasn't closed...