Best Practice #5: ESXTOP (Disk Adapter)
If you missed my first post on ESXTOP (CPU), it includes how to get started if you a new to the utility. I have also already covered ESXTOP (Memory) and ESXTOP (Network).
Navigating ESXTOP into Disk Adapter Mode
d
Using ESXTOP to Monitor Disk Adapter Metrics
Metric DAVG/cmd KAVG/cmd GAVG/cmd | Threshold >20 >1 >20 | What to Check Storage processor/array performance for bottleneck. Kernel driver firmware and adapter queue length. DAVG/KAVG metrics, and Guest OS performance. |
s 2
GAVG/cmd = KAVG/cmd + DAVG/cmd
DAVG/cmd is the adapter device Driver Average Latency per Command. This is the round-trip in milliseconds from the HBA to the storage array and the return acknowledgement. Typically, most admins like to see around 20ms or less, though it can vary significantly depending on your workload and its sensitivity to latency.
Keep in mind that this is latency along the path. So there are a lot of things you will want to check. Does your storage switch (fiber or Ethernet) show any kind of latency? How are the storage processors performing, are they maxed out? What about the disk array itself, are any of the LUNs being presented along that disk path maxing out as well?
DAVG/cmd is a good indicator that you need to start your investigation outside of ESX at the fabric and storage array levels.
KAVG/cmd is the adapter device VMkernel Average Latency per Command. This is the average latency between when the HBA receives the data from the storage fabric and passes it along to the Guest OS, or vice versa—basically the round trip time in the kernel itself. So, it should be a very low value, meaning that the the I/O operation should spend as little time as possible—zero or near-zero is ideal—in the kernel.
The KAVG value should be very small in comparison to the DAVG value and should be close to zero. When there is a lot of queuing in ESX, KAVG can be as high, or even higher than DAVG. If this happens, please check the queue statistics.
— Interpreting ESXTOP Statistics
KAVG/cmd is really important when it compares to the other measurements. If DAVG & KAVG are high, then a storage fabric/array bottleneck is backing things up into the kernel. If GAVG & KAVG are high, then the Guest is having trouble processing the IOs quickly enough and is backing things up into the kernel.
GAVG/cmd is the adapter device Guest OS Average Latency per Command. This is the round-trip in milliseconds from the Guest OS (it's perspective) through the HBA to the storage array and back. This is why this number is a sum of DAVG/cmd + KAVG/cmd. If DAVG & KAVG are within normal thresholds, but GAVG/cmd is high, typically this indicates the VMs on that adapter or at least one of them is constrained by another resource, and needs more ESXi resources in order to process IOs more quickly. In my experience, however, high GAVG/cmd will typically be accompanied by another high value in either DAVG or KAVG.