Page MenuHomeSoftware Heritage

backup01 disk is not large enough to handle the dali backups
Closed, MigratedEdits Locked

Description

The backup of dali is stopped because the backup01 disk is not large enough to handle 2 months* of history.

  • only one month of history is kept but it can temporary have 2 monthly snapshot presents
root@backup01:~# zfs list -t all
NAME                                                                 USED  AVAIL     REFER  MOUNTPOINT
data                                                                 188G  4.65G       96K  none
data/sync                                                            188G  4.65G       96K  none
data/sync/dali                                                       188G  4.65G       96K  none
data/sync/dali/postgresql                                            188G  4.65G     79.6G  none
data/sync/dali/postgresql@autosnap_2022-04-01_00:00:01_monthly      22.4G      -     73.3G  -
data/sync/dali/postgresql@autosnap_2022-04-10_00:00:00_daily        2.86G      -     73.3G  -
data/sync/dali/postgresql@autosnap_2022-04-11_00:00:00_daily        2.39G      -     73.3G  -
data/sync/dali/postgresql@autosnap_2022-04-12_00:00:00_daily        2.83G      -     73.3G  -
data/sync/dali/postgresql@autosnap_2022-04-13_00:00:02_daily        2.49G      -     73.3G  -
data/sync/dali/postgresql@autosnap_2022-04-14_00:00:00_daily        3.39G      -     73.6G  -
data/sync/dali/postgresql@autosnap_2022-04-15_00:00:01_daily        9.60G      -     75.4G  -
data/sync/dali/postgresql@autosnap_2022-04-16_00:00:00_daily        2.38G      -     77.6G  -
data/sync/dali/postgresql@autosnap_2022-04-17_00:00:01_daily        2.27G      -     77.9G  -
data/sync/dali/postgresql@autosnap_2022-04-18_00:00:02_daily        2.25G      -     78.1G  -
data/sync/dali/postgresql@autosnap_2022-04-19_00:00:00_daily        2.31G      -     78.4G  -
data/sync/dali/postgresql@autosnap_2022-04-20_00:00:00_daily        2.89G      -     78.7G  -
data/sync/dali/postgresql@autosnap_2022-04-21_00:00:01_daily        2.29G      -     78.9G  -
data/sync/dali/postgresql@autosnap_2022-04-22_00:00:02_daily        2.31G      -     79.1G  -
data/sync/dali/postgresql@autosnap_2022-04-23_00:00:01_daily        2.25G      -     79.4G  -
data/sync/dali/postgresql@autosnap_2022-04-24_00:00:01_daily           0B      -     79.6G  -
data/sync/dali/postgresql/wal                                       1.57G  4.65G      116M  none
data/sync/dali/postgresql/wal@autosnap_2022-04-01_00:00:01_monthly  41.9M      -     58.0M  -
data/sync/dali/postgresql/wal@autosnap_2022-04-11_00:00:00_daily    66.8M      -     82.9M  -
data/sync/dali/postgresql/wal@autosnap_2022-04-12_00:00:00_daily    58.5M      -     74.6M  -
data/sync/dali/postgresql/wal@autosnap_2022-04-13_00:00:02_daily     103M      -      119M  -
data/sync/dali/postgresql/wal@autosnap_2022-04-14_00:00:00_daily    77.7M      -     93.8M  -
data/sync/dali/postgresql/wal@autosnap_2022-04-15_00:00:01_daily     446M      -      478M  -
data/sync/dali/postgresql/wal@autosnap_2022-04-16_00:00:00_daily    61.7M      -     77.7M  -
data/sync/dali/postgresql/wal@autosnap_2022-04-17_00:00:01_daily    59.5M      -     91.6M  -
data/sync/dali/postgresql/wal@autosnap_2022-04-18_00:00:02_daily    59.8M      -     91.9M  -
data/sync/dali/postgresql/wal@autosnap_2022-04-19_00:00:00_daily    72.9M      -     89.0M  -
data/sync/dali/postgresql/wal@autosnap_2022-04-20_00:00:00_daily    56.9M      -      105M  -
data/sync/dali/postgresql/wal@autosnap_2022-04-21_00:00:01_daily    74.3M      -      122M  -
data/sync/dali/postgresql/wal@autosnap_2022-04-22_00:00:02_daily    66.0M      -      114M  -
data/sync/dali/postgresql/wal@autosnap_2022-04-23_00:00:01_daily    50.6M      -     98.8M  -
data/sync/dali/postgresql/wal@autosnap_2022-04-24_00:00:01_daily    65.3M      -      113M  -
data/sync/dali/postgresql/wal@autosnap_2022-04-25_00:00:00_daily       0B      -      116M  -

We should also add a monitoring alert when the zfs's available space become low.

Event Timeline

vsellier changed the task status from Open to Work in Progress.May 4 2022, 11:25 AM
vsellier triaged this task as High priority.
vsellier created this task.

The disk is sized at 200G, according to the azure portal, it can be resized to 256 without any additional cost.

Enter the size of the disk you would like to create. You will be charged the same rate for your provisioned disk, regardless of how much of the disk space is being used For example, a 200 GiB disk is provisioned on a 256 GiB disk, so you would be billed for the 256 GiB provisioned.

If it's not enough, the next step is 512Go.

backup01 needs to be stopped to be able to resize the disk

The disk needed to be detached and reattached in order to be resized.
It seems zfs didn't detect the pool after the reboot.
A reimport did the work:

root@backup01:~# zpool import data
root@backup01:~# zpool list
NAME   SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
data   255G   188G  66.9G        -         -     9%    73%  1.00x    ONLINE  -

(The pool is still correctly detected after a reboot)

Now let's restart the synchronization:

root@backup01:~# zfs destroy -r data/sync/dali/postgresql
root@backup01:~# systemctl reset-failed syncoid-dali-postgresql.service
root@backup01:~# systemctl restart syncoid-dali-postgresql.service

and the same for data/sync/dali/postgresql_wall

vsellier moved this task from in-progress to done on the System administration board.

The backup is in sync, everything is back to normal