Page MenuHomeSoftware Heritage

swh-cron: manifest-based scheduler for recurring tasks
Closed, MigratedEdits Locked

Description

We need a cron-like service to periodically reschedule (= submit relevant celery tasks) recurrent tasks.

Use cases:

  • ghlister daily updates
  • ghlister complete re-listing (monthly?)
  • git cloning/loading of new github repos
  • git fetching/loading of already known github repos
  • ...

As an architecture, we want the service to read a list of "manifests", that map cron-like periods to swh.core.scheduling.Task objects + their configuration parameters.

As a technological building block we might want to use croniter.

Open design questions:

  • cron-like (i.e., will not catch up with past tasks, e.g., if the daemon was down), or anacron-like (with catchup)

Event Timeline

zack raised the priority of this task from to Normal.
zack updated the task description. (Show Details)
zack added a project: Developers.

We did some f2f thinking about this, concentrating on the "origin update" part of the mechanism. The shortcoming of our mechanism is that it's completely specific to updating our origins, and we can do something better...

At the center of the imagined system, we have a list of recurring (or not) tasks.

task
idbigserial pk
typetext
argumentsjsonb
next scheduled runtstz
current intervalinterval
statusenum {pending, scheduled, disabled}

For each task type, we store a few settings

task type
typetext pk
descriptiontext
celery tasktext
default intervalinterval
backoff factorfloat
min intervalinterval
max intervalinterval

We also use a structure to represent task runs

task run
idbigserial
taskbigint FK
backend task idtext
scheduledtstz
startedtstz
endedtstz
logstext
eventfulboolean

A daemon will periodically look for tasks to run :

  • next scheduled run is in the past
  • status is pending

For those tasks, the daemon will

  • create a new task run entry
  • set the status of the task to scheduled
  • send the task with arguments to our task queue

The task run entry is updated by the workers that execute the task (start and end times). This could be done through an API à la fetch_history.

When a task run is set by a worker as "ended", the corresponding task entry is updated :

  • The new task interval is computed, within the limits set for the task type
    • If the run was uneventful, the task interval is increased using the task type backoff factor
    • If the run was eventful, the task interval is decreased using the task type backoff factor
  • The next scheduled run is set at the previous value incremented by the new interval
  • The task is reset as pending

This update mechanism is self-contained and could be implemented using a row-update trigger on the task run table.

Sample API

task types

  • Create
  • Read
  • Update
  • Delete

tasks

  • Add new task
  • Enable/Disable existing task
  • Run existing task now
  • Change task interval
  • List pending tasks
  • Schedule pending tasks

task runs

  • Update an existing run (used by workers)
  • List runs for task X
olasd changed the task status from Open to Work in Progress.Feb 19 2016, 12:52 PM

An implementation of this is now available in rDSCH.

This is now deployed on moma. Further automation via puppet still needs to be done.

olasd claimed this task.
In T52#3925, @olasd wrote:

This is now deployed on moma. Further automation via puppet still needs to be done.

Can we haz a task for that?

(Secret plan here: collecting enough "let's puppetize this" tasks that might form a corpus for a self-contained, sysadm-oriented internship.)

olasd changed the visibility from "All Users" to "Public (No Login Required)".May 13 2016, 5:05 PM