Page MenuHomeSoftware Heritage
Paste P158

resize main db disk - problem analysis
ActivePublic

Authored by ardumont on Apr 28 2017, 1:55 PM.
prado partition for the main db is full.
#+BEGIN_SRC shell
ardumont@prado:~% df -h /srv/softwareheritage/postgres
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/ssd-prado--postgres-part1 9.0T 9.0T 7.8G 100% /srv/softwareheritage/postgres
#+END_SRC
so postgres's main db is down (the one used for worker injection):
#+BEGIN_SRC shell
ardumont@prado:~% sudo systemctl status postgresql@9.6-main.service
[sudo] password for ardumont:
● postgresql@9.6-main.service - PostgreSQL Cluster 9.6-main
Loaded: loaded (/lib/systemd/system/postgresql@.service; disabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/postgresql@.service.d
└─timeout.conf
Active: failed (Result: exit-code) since Fri 2017-04-28 07:00:01 UTC; 4h 11min ago
Process: 56350 ExecStop=/usr/bin/pg_ctlcluster --skip-systemctl-redirect -m fast %i stop (code=exited, status=2)
Process: 67321 ExecStart=postgresql@%i --skip-systemctl-redirect %i start (code=exited, status=1/FAILURE)
Main PID: 119263 (code=exited, status=1/FAILURE)
Apr 28 07:00:00 prado systemd[1]: Starting PostgreSQL Cluster 9.6-main...
Apr 28 07:00:01 prado postgresql@9.6-main[67321]: The PostgreSQL server failed to start. Please check the log output:
Apr 28 07:00:01 prado postgresql@9.6-main[67321]: 2017-04-28 07:00:00 UTC [67327]: [1-1] FATAL: could not write lock file "postmaster.pid": No space left on device
Apr 28 07:00:01 prado systemd[1]: postgresql@9.6-main.service: Control process exited, code=exited status=1
Apr 28 07:00:01 prado systemd[1]: Failed to start PostgreSQL Cluster 9.6-main.
Apr 28 07:00:01 prado systemd[1]: postgresql@9.6-main.service: Unit entered failed state.
Apr 28 07:00:01 prado systemd[1]: postgresql@9.6-main.service: Failed with result 'exit-code'.
#+END_SRC
prado runs as an lxc container on the louvre hypervisor.
As it's the only lxc container, of course we don't see it with the
standard tool 'qm', so 'pct' it is.
#+BEGIN_SRC shell
root@louvre:~# pct list
VMID Status Lock Name
108 running prado
#+END_SRC
Its vmid is 108.
To retrieve the configuration for that lxc container (prado vmid is 108):
#+BEGIN_SRC shell
root@louvre:~# pct config 108
arch: amd64
cores: 24
hostname: prado
memory: 294912
mp0: /srv/containers/prado/postgres,mp=/srv/softwareheritage/postgres
mp1: /srv/containers/prado/postgres-hdd,mp=/srv/softwareheritage/postgres-hdd
mp2: /srv/containers/prado/space,mp=/srv/storage/space
mp3: /srv/containers/prado/remote-backups,mp=/srv/remote-backups
nameserver: 192.168.100.29
net0: name=eth0,bridge=vmbr0,gw=192.168.100.1,hwaddr=BE:38:0A:92:7B:A6,ip=192.168.100.100/24,type=veth
onboot: 0
ostype: debian
rootfs: /srv/containers/prado/root
searchdomain: internal.softwareheritage.org
startup: order=3
swap: 0
#+END_SRC
The mounting point we are interested in is
/srv/containers/prado/postgres folder which indeed is full:
#+BEGIN_SRC shell
root@louvre:~# df -h /srv/containers/prado/postgres
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/ssd-prado--postgres-part1 9.0T 9.0T 7.8G 100% /srv/containers/prado/postgres
#+END_SRC
That means, we need to increase that disk size:
The command to resize with proxmox would be something like:
#+BEGIN_SRC shell
root@louvre: pct resize 108 mp0 +100G
#+END_SRC
But I believe it would make sense to actually extend the real disk on
the hypervisor louvre.
I'm uneasy with that part though.
I believe I must do the following:
- fdisk the partition to increase the size (closely matching the
starting cylinder)
- resize2fs the volume to actually increase the filesystem
- execute the `pct resize` command mentioned