Page MenuHomeSoftware Heritage

Speed up updates - meta task
Closed, MigratedEdits Locked

Description

Move towards a push event model to speed up updates.

This is the meta-task parent task.

Problem

We are currently polling regularly known origins.
We are passing lots of time on noop origins (stale or deleted ones).
In effect, we are lagging behind and the lag is growing.

Implementation wise:

  • Lister workers list origins from forge. Resulting in the scheduler db as recurring tasks on origins to visit.
  • Runner schedules those origins for updates.
  • Loader workers actually updates origins.

Proposed solution

We are planning on using external source information (ghtorrent, codefeedr, "save now", webhooks, etc...) as interesting events source.

How

Those events will be caught, deduplicated and be source of scheduler's prioritized oneshot tasks of origins to update.

What remains

The periodic scheduling on known origins mechanism (with a higher period).

Board discussion: F3071474

Event Timeline

ardumont triaged this task as Normal priority.Apr 18 2018, 10:00 AM
ardumont created this task.

A per latest discussion:

  • the scheduler's backend api for tasks creation should drop tasks that already exist. This should account for the source with higher oneshot task creation rate (ghtorrent for example).
  • the newly created tasks' priority for one source of events should be unique, e.g. normal. No need for a fancy computation there (at first i did for ghtorrent depending on the events rate on a per origin basis).
  • When reading tasks with priority, the empty priority slots should be filled in one loop of the scheduler runner.
  • NOT YET - When creating a oneshot task, try to detect the equivalent recurring task to delay it (no association exists between them as of today). This is to account for cases where a oneshot task would happen almost at the same time as a recurring one (for the same origin).
gitlab-migration changed the status of subtask T1035: scheduler: Schedule tasks with priorities from Resolved to Migrated.
gitlab-migration changed the status of subtask T1051: Consume ghtorrent events as oneshot tasks with priority normal from Resolved to Migrated.