I had to fix a disk full issue on tate:/ this morning.
It was caused by the presence of dar files in /srv/backups/
I had to fix a disk full issue on tate:/ this morning.
It was caused by the presence of dar files in /srv/backups/
Status | Assigned | Task | ||
---|---|---|---|---|
Migrated | gitlab-migration | T1282 Revisit backups | ||
Migrated | gitlab-migration | T1164 Dar backups fill up disk space on client machines | ||
Migrated | gitlab-migration | T1165 Fix lack of disk space on louvre:/ |
Our backups are stored on a remote filesystem provided by SESI (filer-backup nfs mount on louvre).
The way our setup for dar works currently is:
I think the issue is that if the remote copy fails (for instance if DNS resolution is fubar), the old backups will accumulate on each host.
A quick fix for this issue would be to adapt the local backup script to remove old backups before starting again (and warn by mail that a backup wasn't cleaned up, which means the copy failed somehow).
dar backups have now been replaced with a setup around borg-backup and borgmatic, which only needs a small cache on the machines that are being backed up. borg is fast enough and its dedup is efficient enough that we're able to backups every hour now.
The admin documentation in https://intranet.softwareheritage.org/wiki/Backups has been updated to reflect the new setup.
All the dar setup, crontabs, etc. have been cleaned up from all hosts.