Page MenuHomeSoftware Heritage

scrubber: keep a state file for postgresql datastores
Open, HighPublic


The postgresql scrubber entry point is a command line which sequentially processes a potentially fairly large range of objects (currently, in production, a quarter of all objects at a time).

When the scrubber crashes, either for bugs or for external reasons, it restarts from the beginning. This biases it towards the start of each processed ranges.

It would be nicer to have the option to keep a state file, which would contain a recently successfully processed object range that the next iteration of the scrubber could restart from. This would make crash recovery more useful.

Event Timeline

olasd triaged this task as High priority.Fri, Sep 9, 6:49 PM
olasd created this task.