resize main db disk - problem analysis
ActivePublic
Actions

Authored by ardumont on Apr 28 2017, 1:55 PM.

Tags

None

Subscribers

None

	prado partition for the main db is full.
	#+BEGIN_SRC shell
	ardumont@prado:~% df -h /srv/softwareheritage/postgres
	Filesystem Size Used Avail Use% Mounted on
	/dev/mapper/ssd-prado--postgres-part1 9.0T 9.0T 7.8G 100% /srv/softwareheritage/postgres
	#+END_SRC

	so postgres's main db is down (the one used for worker injection):
	#+BEGIN_SRC shell
	ardumont@prado:~% sudo systemctl status postgresql@9.6-main.service
	[sudo] password for ardumont:
	● postgresql@9.6-main.service - PostgreSQL Cluster 9.6-main
	Loaded: loaded (/lib/systemd/system/postgresql@.service; disabled; vendor preset: enabled)
	Drop-In: /etc/systemd/system/postgresql@.service.d
	└─timeout.conf
	Active: failed (Result: exit-code) since Fri 2017-04-28 07:00:01 UTC; 4h 11min ago
	Process: 56350 ExecStop=/usr/bin/pg_ctlcluster --skip-systemctl-redirect -m fast %i stop (code=exited, status=2)
	Process: 67321 ExecStart=postgresql@%i --skip-systemctl-redirect %i start (code=exited, status=1/FAILURE)
	Main PID: 119263 (code=exited, status=1/FAILURE)

	Apr 28 07:00:00 prado systemd[1]: Starting PostgreSQL Cluster 9.6-main...
	Apr 28 07:00:01 prado postgresql@9.6-main[67321]: The PostgreSQL server failed to start. Please check the log output:
	Apr 28 07:00:01 prado postgresql@9.6-main[67321]: 2017-04-28 07:00:00 UTC [67327]: [1-1] FATAL: could not write lock file "postmaster.pid": No space left on device
	Apr 28 07:00:01 prado systemd[1]: postgresql@9.6-main.service: Control process exited, code=exited status=1
	Apr 28 07:00:01 prado systemd[1]: Failed to start PostgreSQL Cluster 9.6-main.
	Apr 28 07:00:01 prado systemd[1]: postgresql@9.6-main.service: Unit entered failed state.
	Apr 28 07:00:01 prado systemd[1]: postgresql@9.6-main.service: Failed with result 'exit-code'.
	#+END_SRC

	prado runs as an lxc container on the louvre hypervisor.

	As it's the only lxc container, of course we don't see it with the
	standard tool 'qm', so 'pct' it is.

	#+BEGIN_SRC shell
	root@louvre:~# pct list
	VMID Status Lock Name
	108 running prado
	#+END_SRC
	Its vmid is 108.

	To retrieve the configuration for that lxc container (prado vmid is 108):
	#+BEGIN_SRC shell
	root@louvre:~# pct config 108
	arch: amd64
	cores: 24
	hostname: prado
	memory: 294912
	mp0: /srv/containers/prado/postgres,mp=/srv/softwareheritage/postgres
	mp1: /srv/containers/prado/postgres-hdd,mp=/srv/softwareheritage/postgres-hdd
	mp2: /srv/containers/prado/space,mp=/srv/storage/space
	mp3: /srv/containers/prado/remote-backups,mp=/srv/remote-backups
	nameserver: 192.168.100.29
	net0: name=eth0,bridge=vmbr0,gw=192.168.100.1,hwaddr=BE:38:0A:92:7B:A6,ip=192.168.100.100/24,type=veth
	onboot: 0
	ostype: debian
	rootfs: /srv/containers/prado/root
	searchdomain: internal.softwareheritage.org
	startup: order=3
	swap: 0
	#+END_SRC

	The mounting point we are interested in is
	/srv/containers/prado/postgres folder which indeed is full:
	#+BEGIN_SRC shell
	root@louvre:~# df -h /srv/containers/prado/postgres
	Filesystem Size Used Avail Use% Mounted on
	/dev/mapper/ssd-prado--postgres-part1 9.0T 9.0T 7.8G 100% /srv/containers/prado/postgres
	#+END_SRC

	That means, we need to increase that disk size:

	The command to resize with proxmox would be something like:
	#+BEGIN_SRC shell
	root@louvre: pct resize 108 mp0 +100G
	#+END_SRC

	But I believe it would make sense to actually extend the real disk on
	the hypervisor louvre.

	I'm uneasy with that part though.

	I believe I must do the following:
	- fdisk the partition to increase the size (closely matching the
	starting cylinder)
	- resize2fs the volume to actually increase the filesystem
	- execute the `pct resize` command mentioned

Event Timeline

ardumont created this paste.Apr 28 2017, 1:55 PM

resize main db disk - problem analysisActivePublicActions

Event Timeline

resize main db disk - problem analysis
ActivePublic
Actions