Page MenuHomeSoftware Heritage

regularly scrub all the data stores of swh
Open, NormalPublic

Description

Make sure we have background jobs that regularly/constantly check data integrity in all the SWH data sources:

  • check hashes stored in the main postgresql storage (and replicas?)
  • check objects stored in kafka
  • check blob hashes for objects stored in all the objstorages (saam, azure, s3)

For example, doing mirroring tests, I found several blob objects in S3 that look to be corrupted (but original copies in the main objstorage are fine).

Event Timeline

douardda renamed this task from regularly scrub all the data sources of swh to regularly scrub all the data stores of swh.Tue, Jan 11, 12:31 PM
douardda triaged this task as Normal priority.
douardda created this task.
douardda removed a project: Roadmap 2021.