Page MenuHomeSoftware Heritage

Simplify indexer design: move away from the pipeline approach
Closed, MigratedEdits Locked

Description

That does not scale well in regards to scheduling.
We cannot easily schedule the indexer (actually it is with a fork of the main scheduler but it's not a complete thing, the input is still done from a db extract).

Moving towards a range approach, we will be able to schedule a finite range (for content at least).
Adding new indexer will just be a matter of adding yet another task type and the same amount of finite ranges.

That means though:

  • change the indexer's input from arbitrary list of ids to a range of ids (T991)
  • removing orchestrator approach
  • moving some logic within indexer (for example, the language, ctags, license indexers will need to filter themselves for textual content).

Event Timeline

ardumont triaged this task as Normal priority.Nov 7 2018, 10:10 AM
ardumont created this task.