Page MenuHomeSoftware Heritage

Enable NUMA and PCID options on all VMs
Closed, ResolvedPublic

Description

Most Proxmox VMs running on louvre and beaubourg do NOT have the NUMA option enabled.
All hypervisor are NUMA machines, with at least 2 different NUMA nodes each.

It is important to enable the Proxmox NUMA option in order to keep virtual CPU cores pinned on real hardware cores and reduce inter-die coherency traffic as much as possible.

The new PCID Proxmox flag also needs to be enabled for all VMs. Without it, Proxmox will not manage Process Context IDentifiers as described in http://linuxeco.com/?p=77
This feature was previously uninteresting but is now critical for performance if some Intel hardware bug (Meltdown, etc...) mitigations are enabled.

Event Timeline

ftigeot created this task.Sep 5 2018, 11:01 AM
ftigeot triaged this task as Unbreak Now! priority.

numastat output on louvre, for reference:

                           node0           node1           node2           node3
numa_hit             13497023257     14081211989     14852512306     17957276918
numa_miss             8599494310      7372048640      2126471863      2510901890
numa_foreign          4832232163      2059616197      4052656329      9664412014
interleave_hit             21033           20998           21022           20991
local_node           13497008321     14081146133     14852428849     17957202370
other_node            8599509246      7372114496      2126555320      2510976437

numastat output on beaubourg, for reference:

                           node0           node1
numa_hit             24194141993     34805632693
numa_miss             6528825760       313114704
numa_foreign           313114704      6528825760
interleave_hit             44068           43188
local_node           24194499550     34805370119
other_node            6528468203       313377278
ftigeot added a comment.EditedSep 5 2018, 11:49 AM

numastat output on orsay, for reference:

                           node0           node1           node2           node3
numa_hit               154258622       106196783       173789251       218914560
numa_miss                      0               0               0               0
numa_foreign                   0               0               0               0
interleave_hit              6864            6821            6872            6826
local_node             154248017       106178817       173773345       218903410
other_node                 10605           17966           15906           11150

                           node4           node5           node6           node7
numa_hit               262959516       114920739       238787141       239766195
numa_miss                      0               0               0               0
numa_foreign                   0               0               0               0
interleave_hit              6866            6814            6865            6821
local_node             262942456       114899166       238772174       239752207
other_node                 17060           21573           14967           13988

This machine is not running anything at the moment, which explains the optimal ratio of numa hits vs numa misses per node.

ftigeot renamed this task from Enable NUMA option on all VMs to Enable NUMA and PCID options on all VMs.Sep 5 2018, 2:39 PM
ftigeot updated the task description. (Show Details)

All worker VMs on louvre restarted with NUMA and PCID flags.
They were resized from 16 to 12 GBs of RAM and from 4 to 3 CPU cores in order to waste less hypervisor resources.

ftigeot changed the task status from Open to Work in Progress.Sep 5 2018, 5:02 PM

All worker VMs on beaubourg restarted with the same settings.

zack lowered the priority of this task from Unbreak Now! to High.Sep 6 2018, 12:00 PM
ftigeot closed this task as Resolved.Sep 10 2018, 4:33 PM

All VMs restarted with PCID and NUMA flags.