Best Practice #2: ESXTOP
- System Worlds. The worlds that are needed to perform various system services. These include one idle world per physical CPU that runs when there is nothing else to run on that physical CPU, helper worlds for performing asynchronous tasks and driver worlds.
- Kernel Worlds: The world for the vmkernel. It always runs on physical CPU0.
- Virtual Machine Worlds: The world for each virtual machine, where we are often looking to troubleshoot performance (though the other two worlds might offer a clue to host-wide performance issues).
What To Check
Check Guest OS processes for high utilization; check to see if enough vCPUs are allotted.
Check to see if too many vCPUs are allotted.
Check VM CPU limit.
100% of a World = %RUN + %RDY + %CSTP + %WAIT
%RUN is the world Run time. It is quite obvious, and needs little explanation, but the run time is just that, the time the world was scheduled to run and it did.
%RDY is the world Ready time. It is the percentage of time the world was queued to run but could not run because it was waiting on resources, in this case, CPU resources. This can be caused by a lot of things, but typically the world is being constrained by a lack of CPU resources.
%CSTP is the world Co-descheduled State time (applies to vSMP VMs only). It is the percentage of time the world was descheduled due to one vCPU being used far more than other vCPUs, and is usually indicative of two things: 1) either the VM has CPU affinity set and other VMs one of those pCPUs are starving it of resources, or 2) the workload on the VM (i.e., the application) does not handle multithreading very well and is really only using one vCPU anyway.
%WAIT is the world Wait time. It is the percentage of time the world was waiting to be scheduled for a task, and it is normally VERY high because it includes the %IDLE metric for a group of five additional worlds:
For a VM, there are other worlds besides the VCPUs, such as a mks world and a VMX world. Most of time, the other worlds are waiting for events. So, you will see ~100% %WAIT for those worlds. — Interpreting ESXTOP Statistics
%SYS is the world System time. It is the percentage of time the world was scheduled to run but was waiting on system resources such as disk I/O. A VM with a high %SYS when compared to %RUN usually indicates it is being starved of I/O resources.
%MLMTD is the world Maximum Limited time. It is the percentage of time that the world was scheduled to run but didn't because doing so would violate the configured CPU resource limit, so the VMkernel limited it. Unless this is by design, you should not see any metrics in this column.
Note: This is just a small ESXi host with a couple of VMs on it, a few vSMP and a few configured for only a single vCPU.
Well, that's it! Happy troubleshooting, and stay tuned for the next installment where we will look at ESXTOP and Memory metrics.