Page MenuHomeSoftware Heritage

Investigate using zfs snapshots for incremental database backups, instead of rsync
Closed, MigratedEdits Locked

Description

The current database backup system is based on barman

It uses a combination of continuous WAL archival, and weekly base backups using rsync, to achieve point in time recovery over a window of several weeks.

This system is well integrated and works without many issues.

The main problems with this system are the following:

  • uses rsync for incremental backups; this forces the system to look at all database files on the system, and transfer them in full if they have changed. Now that we're using zfs, we could be much more efficient by using zfs snapshots (and transfer only the changed blocks, without ever touching the filesystem level). We're also unable to recover from an interrupted backup (so we need to do it again)
  • only does a weekly base backup; this forces us to replay, in the worst case, more than a week of WALs, to do a recovery. In our tests, this took a lot of time. Doing more frequent backups is not really tractable with rsync, as the full backup takes almost 2 days now.

Unfortunately I haven't been able to find a postgres backup system with proper zfs integration, which means we would need to design our own, with a combination of snapshots using zfs send/receive and wal archival using pg_receivexlog.

We'd also need to use zfs on the backup server, which doesn't feel like too much of an issue, except for the fact that the current backup server doesn't have much wiggle room.

Event Timeline

olasd triaged this task as Wishlist priority.Dec 2 2020, 10:35 AM
olasd created this task.

can storage be easily added to the backup server?

banco and the storage array attached to it are both five years old and coming on their end of warranty period. It can probably be extended, but I don't know if we want to do that or to replace it with another server.