When both the lister API and the recent visit cache have been seeded, we should be able to implement the actual scheduling policy for the new scheduler.
- generate the list of the "next" origin urls to load from the scheduler tables (according to the scheduling policy);
- take a list of urls and generate "legacy" one-shot tasks;
- "visit simulator" which updates the scheduler database according to a simulated loading time for each origin, and allows us to monitor the behavior of the full simulated scheduling/loading infrastructure.
- get a model of current loading time distribution
- determine which metrics we want to
- optimize the scheduler policy
- check for runaway edge cases, e.g. origins that never get loaded even if the "average" behavior is okay
- reduce the "number of useless visits"
- lag between actual commit and next visit
- ...