Page MenuHomeSoftware Heritage

Consider backfilling mistakenly rejected large objects from PostgreSQL
Closed, MigratedEdits Locked

Description

We should consider whether we want to backfill the objects missing from the kafka cluster on azure from PostgreSQL.

We'd need to:

  • make a list of objects currently in Kafka
  • make a list of objects in PostgreSQL
  • backfill the missing objects "by hand"

As we're going to bootstrap a new kafka cluster in Rocquencourt soon, we should evaluate whether we really need to be doing this beforehand, or if we can just wait to have backfilled the new cluster from scratch instead.

Event Timeline

olasd triaged this task as Normal priority.Apr 6 2020, 10:35 PM
olasd created this task.
olasd changed the task status from Open to Work in Progress.Apr 15 2020, 10:26 AM
olasd added a subscriber: seirl.

I've pulled the list of objects from kafka using @seirl's graph export. I'm now looking to make the diff between postgres and that list of objects.

Backfilled objects:

  • snapshot
  • release

Need double-checking:

  • revision

Currently sorting the objects that exist in kafka:

  • directory

TODO:

  • content
olasd claimed this task.

snapshots, releases, revisions and directories have now been completely backfilled, and no objects of these types are (known to be) missing from the kafka cluster on azure.