The current database backup system is based on barman
It uses a combination of continuous WAL archival, and weekly base backups using rsync, to achieve point in time recovery over a window of several weeks.
This system is well integrated and works without many issues.
The main problems with this system are the following:
- uses rsync for incremental backups; this forces the system to look at all database files on the system, and transfer them in full if they have changed. Now that we're using zfs, we could be much more efficient by using zfs snapshots (and transfer only the changed blocks, without ever touching the filesystem level). We're also unable to recover from an interrupted backup (so we need to do it again)
- only does a weekly base backup; this forces us to replay, in the worst case, more than a week of WALs, to do a recovery. In our tests, this took a lot of time. Doing more frequent backups is not really tractable with rsync, as the full backup takes almost 2 days now.
Unfortunately I haven't been able to find a postgres backup system with proper zfs integration, which means we would need to design our own, with a combination of snapshots using zfs send/receive and wal archival using pg_receivexlog.
We'd also need to use zfs on the backup server, which doesn't feel like too much of an issue, except for the fact that the current backup server doesn't have much wiggle room.