Page MenuHomeSoftware Heritage

uffizi disk's full makes some workers fail
Closed, ResolvedPublic

Description

Uffizi regularly got its / disk full.
Resulting in workers (well at least the indexers) failing about OSError... no more disk space (P197 for a sample extracted from our logstash instance).

It's possibly related to T789 (quoting the first line of that issue):

We currently let the systemd journal take up as much space as it needs on the root filesystem of our machines.

And indeed, the /var/log/ (mounted on /) is quite bloated at that moment.

The workaround so far, for my part, was to:

  • claim some space in /var/cache/apt/archives
  • to permit triggering the logrotate rune to compress logs
sudo logrotate -f /etc/logrotate.conf

A better workaround (as a default for a better solution) would be to increase uffizi's disk size.

Event Timeline

ardumont created this task.Dec 1 2017, 1:54 PM
ardumont renamed this task from uffizi disk full make workers failing to uffizi disk's full makes some workers fail.
ardumont updated the task description. (Show Details)Dec 1 2017, 1:57 PM
ardumont updated the task description. (Show Details)Dec 1 2017, 2:01 PM
ardumont updated the task description. (Show Details)
zack raised the priority of this task from Normal to Unbreak Now!.Dec 1 2017, 2:05 PM
ftigeot claimed this task.Dec 12 2017, 4:32 PM
ftigeot closed this task as Resolved.

Resolved with help from @olasd .

Steps taken:

  • Cap most log files to 100MB:
/etc/logrotate.d/rsyslog
...
+maxsize 100M
...
  • Resize base disk image using Proxmox or low-level dm commands
  • Resize / partition
apt-get install parted
parted /dev/vda
  resizepart 1
  100%
  • Resize / filesystem
resize2fs /dev/vda1

Base disk image size was increased from 10 to 20GB.