Page MenuHomeSoftware Heritage

Efficient reindex when adding a metadata mapping
Closed, MigratedEdits Locked

Description

When adding a new metadata mapping, there should be a way to run metadata indexers only on affected origins.

Event Timeline

vlorentz created this task.

The naive solution to do this is adding a new indexer that pre-fetches snapshot+revision+root dir of an origin and writes its list of root files in the indexer db. Then we can read that to find which origins have a given file name pattern.

Or, as the OriginMetadataIndexer already fetches this data anyway, it could write it to the indexer db when it's done.

It's kind of out of its current scope, but would spare an heavy to the graph storage; and skips a round-trip to the scheduler.