Move towards a push event model to speed up updates.
This is the meta-task parent task.
Problem
We are currently polling regularly known origins.
We are passing lots of time on noop origins (stale or deleted ones).
In effect, we are lagging behind and the lag is growing.
Implementation wise:
- Lister workers list origins from forge. Resulting in the scheduler db as recurring tasks on origins to visit.
- Runner schedules those origins for updates.
- Loader workers actually updates origins.
Proposed solution
We are planning on using external source information (ghtorrent, codefeedr, "save now", webhooks, etc...) as interesting events source.
How
Those events will be caught, deduplicated and be source of scheduler's prioritized oneshot tasks of origins to update.
What remains
The periodic scheduling on known origins mechanism (with a higher period).
Board discussion: F3071474