Page MenuHomeSoftware Heritage

Clean striped object storages from objects they should not be containing
Closed, MigratedEdits Locked

Description

To reclaim the space used by objects redundant between uffizi and banco, we need to have a process to remove objects.

I propose the following policy:

For all objects in the archiver database

  1. ensure the object exists on azure; check its integrity
  2. if storage space allows on ceph: ensure the object exists on ceph; check its integrity
  3. ensure the object exists on its target striped storage; check its integrity
  4. remove the object from the other striped storage

This will ensure that we have a redundancy of at least 2, ideally 3 copies for each content we remove.

Event Timeline

olasd triaged this task as High priority.May 3 2018, 4:53 PM
olasd created this task.

FWIW, the proposed policy looks good to me. Green light!

olasd changed the task status from Open to Work in Progress.Jul 17 2018, 2:06 PM

This has been running for a while. It's quite taxing on uffizi even with nice and ionice set very high: load has been consistently around 45.

Maybe it's time to move uffizi to a container instead of a VM?

Given Uffizi already uses 64GB of RAM (more than some physical machines), this should be a no brainer.
I am not sure if this would really improve I/O performance, though.

We've moved on to other filesystems and we're not really planning on wiping the old ones anymore.