Page MenuHomeSoftware Heritage

Split up pergamon to smaller VMs
Started, Work in Progress, NormalPublic

Description

Pergamon is actually a VM running on the louvre hypervisor.

Since about a week, it has been experiencing various performance issues, including but not limited to higher loads, increased amounts of I/O wait time and completely locked up virtual CPUs:

kernel:[259911.781535] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 134s! [systemd-cgroups:666]

This VM is running a huge amount of services. Among them, we have:

  • a DNS server for the internal softwareheritage networks
  • Icinga monitoring
  • Prometheus monitoring
  • Munin
  • Puppet
  • A Debian repository

This is a bit much for a single VM. Creating smaller ones dedicated to one or two service at most would help isolate the services and avoid performance impacts among them.
If a service were to require more hardware resources than the others, its VM could be migrated to a less loaded/more powerful hypervisor.

Event Timeline

ftigeot created this task.Aug 24 2018, 2:48 PM
ftigeot triaged this task as Normal priority.
ftigeot renamed this task from Split up pergamon in smaller VMs to Split up pergamon to smaller VMs.Aug 24 2018, 3:01 PM
ftigeot changed the task status from Open to Work in Progress.Sep 4 2018, 12:03 PM

An Apache instance on pergamon is providing http and/or https services for the following hosts:

  • annex.softwareheritage.org_non-ssl
  • debian.softwareheritage.org
  • docs.softwareheritage.org
  • grafana.softwareheritage.org
  • icinga.softwareheritage.org
  • pergamon:8140 (puppet)