Performance Best Practice #2: NUMA Alignment
NUMA is an . . . approach that links several small, cost-effective nodes using a high-performance connection. Each node contains processors and memory, much like a small SMP system. However, an advanced memory controller allows a node to use memory on all other nodes, creating a single system image. When a processor accesses memory that does not lie within its own node (remote memory), the data must be transferred over the NUMA connection, which is slower than accessing local memory. Memory access times are not uniform and depend on the location of the memory and the node from which it is accessed, as the technology’s name implies.
— Using NUMA Systems with ESXi
And since larger machines are typically the culprits of breaking the NUMA node barrier, and since those machines are typically also our more critical ones, they tend to be more sensitive to any kind of latency, and thus NUMA alignment becomes important to us as virtualization admins.
So then let's consider four scenarios, and show how they either fit or do not fit within NUMA boundaries.
Scenario 1: Exchange VM with 4 vCPUs | Not Ideal
Scenario 2: Exchange VM with 12GB of RAM | Not Ideal
Scenario 3: Exchange VM w/ 4 vCPU with HT for NUMA Enabled | Ideal
Scenario 4: Exchange VM w/ 8GB RAM | Ideal
- Know your NUMA size. That's the first step. Especially consider this when buying new servers. You may need to increase your core count or RAM size in order to accommodate your workloads.
- CPU. Those with newer hardware will have less issues with NUMA boundaries; those of us with older hardware, especially those dual-socket, dual-core HT systems, will need to be more aware as many of our VMs today have 4 or more vCPUs.
- RAM. Those with many, smaller servers will likewise have more issues with NUMA boundaries due to the increased amount of RAM being required by applications.
- Size your VMs, wherever possible, around even multiples of the NUMA node: for example, on a hex-core (6C) system, stick to 2, 3, or 6 cores.
- If you enable HyperThreading, enable ESXi and/or the VM to be able to use it towards the NUMA core count, and also tell the VM to keep multiple virtual cores together (see below).
- Enabling CPU Hotplug will disable vNUMA (where the underlying NUMA architecture is exposed directly to the guest OS) — VMware KB 2040375
- Just because you can make a monster VM because your NUMA node size is 16 CPUs and 128GB of RAM still doesn't mean you should; right-size your VMs to avoid unnecessary kernel NUMA scheduling.