Page MenuHomeSoftware Heritage

scrubber: keep a state file for postgresql datastores
Closed, MigratedEdits Locked

Description

The postgresql scrubber entry point is a command line which sequentially processes a potentially fairly large range of objects (currently, in production, a quarter of all objects at a time).

When the scrubber crashes, either for bugs or for external reasons, it restarts from the beginning. This biases it towards the start of each processed ranges.

It would be nicer to have the option to keep a state file, which would contain a recently successfully processed object range that the next iteration of the scrubber could restart from. This would make crash recovery more useful.

Event Timeline

olasd triaged this task as High priority.Sep 9 2022, 6:49 PM
olasd created this task.
vlorentz added a subscriber: vlorentz.

I think I'll implement it as a table in the scrubber's DB, this will make it easier to query the current status of scrubbing and add it to the Grafana dashboard

I think I'll implement it as a table in the scrubber's DB, this will make it easier to query the current status of scrubbing and add it to the Grafana dashboard

Yeah, that's a good idea!