Page MenuHomeSoftware Heritage

Sort out disk space on beaubourg / somerset
Started, Work in Progress, HighPublic

Description

beaubourg has run out of disk space on the postgresql partition. This broke logical replication between belvedere and somerset, and probably the main indexer database too.

Ideally, now that the VM storage has been moved to ceph on dedicated SSDs, we should migrate this local postgres storage to ZFS.

One way we could do this:

  • file-level copy of the data to a dedicated ceph storage space
  • tear down the current lvm / raid pile
  • initialize a new zfs pool
  • copy the data back

We might want to also migrate the main indexer database to belvedere, as it still has some spare SSDs (T2440)

Event Timeline

olasd triaged this task as High priority.Sep 3 2020, 4:27 PM
olasd created this task.
olasd added a comment.Sep 4 2020, 3:19 PM

Current disk usage on somerset:

--- /srv/softwareheritage/postgres/11 -----------------------------------------
                         /..
    9.4 TiB [##########] /replica
    2.3 TiB [##        ] /indexer
  904.2 GiB [          ] /testdedup                                                                                                                                                                                                                                                                                          
  232.2 GiB [          ] /occurrences

belvedere has 5.7TB free before compression, and still has a bunch of unused disks. So we're definitely able to move the indexer database there.

olasd added a subscriber: seirl.Sep 8 2020, 2:54 PM

I've had agreement with Guillaume about making the occurrences db a static dump;

I need @seirl to confirm what to do with the testdedup cluster.

I've started looking at T2440 again because we're going to be quite tight on disk space on belvedere before being able to migrate the data of the indexer cluster (we have around 4TB of WALs piled up in the main cluster).

olasd added a comment.Sep 8 2020, 5:34 PM

I've started migrating the data of the indexer cluster now that disk space on belvedere has been sorted out.

On belvedere:

zfs create data/postgres-indexer-11 -o mountpoint=/srv/softwareheritage/postgres/11/indexer -o compression=lz4 -o atime=off -o primarycache=metadata -o recordsize=16k

On somerset:

root@somerset:/srv/softwareheritage/postgres/11# rsync -PaHAXx --info=progress2 --append indexer/ belvedere:/srv/softwareheritage/postgres/11/indexer/
olasd changed the task status from Open to Work in Progress.Sep 9 2020, 11:54 AM
olasd claimed this task.

The indexer database data has been migrated to belvedere.

The replication setup to dbreplica1.euwest.azure has been updated:

  • On dbreplica1:
    • update /srv/softwareheritage/postgres/11/replica/recovery.conf to point at the new server
    • update /var/lib/postgresql/.pgpass to record credentials of the new server
    • update /var/lib/postgresql/.ssh/authorized_keys for keys present on belvedere
  • On belvedere:
    • ensure /var/lib/postgresql/.ssh/config uses a proper ssh key for the postgresql archive_command.
    • check that the archival works:
      • select pg_switch_wal(); to force WAL rotation
      • select * from pg_stat_archiver; to make sure the last wal archival was successful.

The indexer db configuration has also been updated in puppet.