Page MenuHomeSoftware Heritage

dbreplica1 2018-06-30 event postmortem
Closed, ResolvedPublic

Description

dbreplica1.euwest.azure.internal.softwareheritage.org stopped responding around 04:00 UTC on 2018-06-30.

Monitoring data show an increase of running threads, processes, slab cache usage as well as an I/O wait peak immediately before that time.

It is most likely possible the VM was not appropriately sized and couldn't handle a load spike.

Event Timeline

ftigeot created this task.Jul 3 2018, 4:39 PM
ftigeot triaged this task as Normal priority.

dbreplica1.euwest.azure.internal.softwareheritage.org runs a slave instance of the softwareheritage-indexer database, with its master running on somerset.internal.softwareheritage.org.

Somerset: is a VM with 28 GB of RAM and dbreplica1 only has 7GB of RAM.
Historical data shows the two virtual CPU cores on dbreplica1 are currently enough.

Possible Azure VM types with 14-16GB RAM available on the Euwest zone:

  • a2mv2 16GB RAM $90.52/month
  • E2v3 16GB RAM $116.80/month
  • E2sv3 16GB RAM $116.80/month
  • D11v2 14GB RAM $138.70/month
  • DS11v2 14GB RAM $138.70/month

The D11 and DS11 VM sizes are not very interesting.
Among the rest, the existing VM can only be resized to an e2s_v3 model.

The VM has thus been resized to an e2s_v3 model today.

ftigeot closed this task as Resolved.Jul 31 2018, 4:03 PM

No new problem noticed since ~= a month, closing.