Page MenuHomeSoftware Heritage

Implement the scheduling policy for the recurrent visit scheduler
Closed, MigratedEdits Locked

Description

When both the lister API and the recent visit cache have been seeded, we should be able to implement the actual scheduling policy for the new scheduler.

  • generate the list of the "next" origin urls to load from the scheduler tables (according to the scheduling policy);
  • take a list of urls and generate "legacy" one-shot tasks;
  • "visit simulator" which updates the scheduler database according to a simulated loading time for each origin, and allows us to monitor the behavior of the full simulated scheduling/loading infrastructure.
    • get a model of current loading time distribution
    • determine which metrics we want to
      • optimize the scheduler policy
      • check for runaway edge cases, e.g. origins that never get loaded even if the "average" behavior is okay
      • reduce the "number of useless visits"
      • lag between actual commit and next visit
      • ...

Related Objects

StatusAssignedTask
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration

Event Timeline

vlorentz updated the task description. (Show Details)
vlorentz changed the task status from Open to Work in Progress.Jan 18 2021, 2:08 PM
vlorentz moved this task from Backlog to todo on the Sprint 2021 01 board.
vlorentz moved this task from todo to in-progress on the Sprint 2021 01 board.