Page MenuHomeSoftware Heritage

Orchestrate origins scheduling according to scheduler metrics feedback
Closed, ResolvedPublic


Currently the next-gen scheduler runners [1] for origins are triggered on each scheduler
machine per environment [2]. So far, they are managed manually through a cli call for
each type of visits (and eventually, once in a while per lister or forge, ...).

We need to replace those with a new module which does not exist yet, the orchestrator.
That module would be in charge of recurringly schedule origins of a given type according
to the scheduler metrics available. Deployment wise, that'd be a systemd service that
would launch it.

Its scheduling algorithm would be something like:

for each visit type:
  - "check queue status":
      if below threshold for that queue:
        goto "fill-in the void state"
        continue: "next visit type"

  - "fill-in the void state":
    for each (randomly picked?) policy (according to ratio [3]) for that visit type:
      if room in the queue and origins to schedule for that policy:
        schedule origins: push messages in the queue
      elif room in the queue and no more origins to schedule for that policy:
        continue: "next policy"
      else ("no more room in the queue"):
        continue: "next visit type"
    end for
end for

[1] scheduler runners are process in charge of recurringly schedule origins per visit
type (and more if need be, e.g. lister-uuid, ...)

[2] root tmux session in:

  • prod: saatchi
  • staging: scheduler0

[3] Ratio could be:

| visit_type                                                     | scheduling policies               | ratio |
| package-loader: archive, cran, debian, npm, nixguix, pypi, ... | already_visited_order_by_lag      |    50 |
|                                                                | never_visited_oldest_update_first |    50 |
| git, svn, hg                                                   | already_visited_order_by_lag      |    49 |
|                                                                | never_visited_oldest_update_first |    49 |
|                                                                | origins_without_last_update       |     2 |

Event Timeline

ardumont created this task.
ardumont updated the task description. (Show Details)
ardumont changed the task status from Open to Work in Progress.Oct 28 2021, 4:34 PM

Deployed in staging.

ardumont claimed this task.
ardumont moved this task from deployed/landed to done on the System administration board.