During the duration of the APD condition and after, the array still responds to ping and the netcat tests are also successful. There is no evidence to indicate a physical network or a NFS storage array issue. — VMware KB 2076392
For those of you who have been keeping up with VMware, here is a status update on the NFS All Paths Down (APD) condition: it's fixed!
If you are unfamiliar with the vSphere 5.5 Update 1 NFS APD issue, basically some customers (not all, mind you) across different vendors, ran into an error condition where vSphere would report all NFS paths to be down and thus the NFS mount and any VMs or RDMs on it would become inaccessible—even though the paths were perfectly fine.
If you would like to read further, the issue is described in detail in the above VMware KB 2076392.
Continuing my ongoing recap of my recent vSphere 5.5 technical deep-dive, I now shift to Best Practices. This is installment twenty-one in this series. To view all the posts in this series, click Geeknicks in the categories list.
Best Practice #6: Using ESXTOP (Disk Devices)
A very powerful troubleshooting tool is included straight of the box (so to speak) with ESXi: ESXTOP. There is a ton of features to it, so we won't cover them all here; rather I will refer you to a great post by Duncan Epping, ESXTOP master.
If you missed my first post on ESXTOP (CPU), it includes how to get started if you a new to the utility. I have also already covered ESXTOP (Memory), ESXTOP (Network), and ESXTOP (Disk Adapters).
Navigating ESXTOP into Disk Device Mode
When you start ESXTOP, to enter the Disk Adapter section of the utility type
I know what you are thinking: pressing u for the disk device menu doesn't make a lot of sense, since the rest of the letters all match their respective modes:
* This shortcut is case-sensitive. Pressing v takes you into Disk (VM) Mode, while pressing V filters the current screen mode to only show virtual machine worlds.
So, I don't know what tell you, but u it is. As with CPU and Memory, you will likely want to customize the screen with a couple of extra Disk Device metrics, so press f and choose, for example, Qstats, ErrStats, and ResvStats.
Using ESXTOP to Monitor Disk Device Metrics
Most of my metric explanations are from Interpreting ESXTOP Statistics, though sometimes in my own terms. I have also relied on Duncan Epping's ESXTOP Forum for many of the metrics measurements (as I will do in future posts), though in some places I choose different thresholds for one reason or another.
One thing that you will notice in the above screenshot is that as of ESX 4.0U2, NFS mounts are now including in ESXTOP Disk Device metrics. Nice!
For an explanation of DAVG/cmd, KAVG/cmd, and GAVG/cmd latency metrics, see my previous post on Disk Adapter metrics.
Now for further explanation, and I will include a couple of more metrics here in terms of explanation.
DQLEN is the configured Device Queue Length. This is really a reference point to make sure you have configured your devices correctly. A quick glance, as in the screenshot above, and you might notice one queue misconfigured.
BLKSZ is the configured Device Block Size. This is another reference point to ensure that you have the correct block size for the type of workload you are running.
RESETS/s is the number of Device SCSI Reset Commands per Second. A SCSI reset command is issued when the SCSI operation fails to reach the target, and in a SAN environment is usually indicative in a path down or multipathing issue—i.e., ESXi thinks a path is fine but in reality it is faulty. This is commonly seen on Cisco Nexus fabrics as CRC errors on a port, for example.
ABRTS/s is the number of Device SCSI Abort Commands per Second. A SCSI abort command is issued from the Guest OS when the command times out waiting for a response acknowledgement. In Windows 2008 and later, this is 60 seconds by default. Typically if you are encountering a large number of aborts, the storage fabric/array is causing a bottleneck and is the place to begin your investigation.
If you are using something such as a NetApp FAS, be sure that you run the GOS Timeout Script on your VM or VM template to make sure you have the proper timeout values (login required) set in order to prevent a SCSI abort during a path failover or path problem.
QUED is the current Device Commands Queued in the VMkernel. As I explained previously, this number should be at zero or near zero, otherwise it is indicating that something in the kernel is throttling the IO throughput between the Guest OS and the HBA/storage fabric/array. Check firmware versions for correct revisions and other performance tuning options within ESXi, especially vendor recommendations.
RESV/s is the Device SCSI Reservations per Second. SCSI reservations are commonplace; that's how SCSI commands work. This value is only important as it relates to CONS/s.
CONS/s is the Device SCSI Reservation Conflicts per Second. If this value is greater than RESV/s, then it is indicative that some other ESXi hosts are holding reservations on this particular path that are conflicting with reservations currently held by this particular host. A very high value could be felt as a performance sluggishness in the storage subsystem due to the kernel constantly requesting SCSI locks and being denied, and consequently, retrying.
Troubleshooting SCSI reservation conflicts can be challenging. Some helpful information can be found in this VMware KB deep-dive article on Troubleshooting SCSI Reservation Conflicts, as well as in VMware KB 1005009 and VMware KB 1002293.
Continuing my ongoing recap of my recent vSphere 5.5 technical deep-dive, I now shift to Best Practices. This is installment fifteen in this series. To view all the posts in this series, click Geeknicks in the categories list.
Best Practice #11: Configure NetApp (Storage) Integration
One of the best moves by VMware was to go away from the C# Legacy (Desktop) Client and move to a browser-based version (though don't get me started on Java compatibility and Adobe Flash). Why? Because it is going to make future integration much better and easier for vendors such as NetApp to integrate their plugins through standard APIs and SDKs. One such plugin example is the NetApp Virtual Storage Console or VSC.
Disclaimer: I am a NetApp advocate, so I use NetApp as the example here, though certainly other storage vendors have their own plugins and APIs for VMware.
NetApp Virtual Storage Console
So what is the NetApp VSC? And why would you want to us it? Well, to take a quote from the solutions brief:
NetApp Virtual Storage Console (VSC) for VMware vSphere® is a vSphere client plug-in that fully integrates with vCenter and enables central administra- tion of VMware vSphere environments. . . . It combines several key technologies that deliver comprehensive, centralized management of NetApp storage operations in both SAN- and NAS-based VMware virtual server and desktop infrastructures. These operations include discovery, health monitoring, capacity manage- ment, provisioning, cloning, backup, recovery, and VM optimization.
In short, it makes the life of a virtual or storage admin much easier, but automating a lot of processes for you and making sure that your configuration adheres to many vSphere and NetApp best practices. Download the VSC (login required).
But if you want a more concise list, here are my top five reasons that you should install and regularly use the NetApp VSC.
Note: VSC 5.0 is compatible with the vSphere 5.5 Web Client only! Legacy client users must stay at 4.2.1 for now (and likely for the future, since all vSphere products will now be developed for the Web Client only).
NetApp VASA for VAAI
In addition to the VSC, you can also install the VASA Provider for Clustered DataONTAP. VASA is VMware® term for vStorage APIs for Storage Awareness. VASA 5.0 is new, and it is downloaded as a standard virtual appliance (OVA). It is dependent upon VSC 5.0 as its management console, so that must already be up and functioning.
VASA Provider acts as an information pipeline that provides information to the vCenter Server about NetApp® storage systems associated with VSC. Sharing this information with vCenter Server enables you to make more intelligent virtual machine provisioning decisions and be notified when certain storage conditions might affect your VMware environment.
What VASA really does is provide more information about the underlying storage hardware and configuration characteristics to vSphere itself. In essence, you can work better as a virtual admin to map storage profiles to the appropriate datastores (i.e., software-defined storage policies) and receive alerts when those profiles are performing out-of-compliance with their capabilities.
Note: VASA 5.0 is for Clustered DataONTAP only; 7-mode customers will have to stick with version 1.0.1, and as with the VSC, this will probably be the last version.
Continuing my ongoing recap of my recent vSphere 5.5 technical deep-dive, I now shift to Best Practices. This is installment twelve in this series. To view all the posts in this series, click Geeknicks in the categories list.
Best Practice #7: NFS Storage Optimization
The Network File System or NFS has, in my opinion, gotten a bad rap. In its earlier implementations (and I mean a long time ago) it did have some issues—but what protocol hasn't? If you are deploying ethernet-based storage networks, or if you are considering changing to one or setting one up from scratch, you should seriously consider NFS.
Why NFS on vSphere 5.5?
So why should you consider it? First, one should always consider all available options when designing a system; design is based on which technologies best meet the requirements, not based on which ones an engineer or architect might prefer. Second, well, VMware says you should consider it—ha!
The capabilities of VMware vSphere® on NFS are very similar to those of vSphere on block-based storage. VMware offers support for all vSphere features and functions on NFS, as it does for vSphere on block storage. Running vSphere on NFS is a viable option for many virtualization deployments, because it offers strong performance and stability when configured correctly. — vSphere NFS Best Practices
Here's a great list of some the enhanced features of NFS on vSphere 5.5 as compared to other protocols.
**There is no technical reason why a VMDK on NFS can't be used for Exchange datastores, as many experts have pointed out!
*By unlimited, I mean that NFS, by the nature of its protocol, is only limited to by the vendor's array configuration. NetApp, for example, offers the FAS8060 which supports a 364TB NFS volume. Further, with Clustered DataONTAP, NetApp introduced the Infinite Volume (TR-4037), which allows expansion far beyond that of a single array.
NFS: It's Easy
First and foremost, NFS is very easy to implement. That's why so many people love it—besides the fact that it uses regular hardware that comes standard in every server. 10GbE adapters are still generally cheaper than their 8GbFC counterparts—let along 16GbFC! And with the rise of 40GbE, we will likely see in the near future support for NFS across that speed as well, though certainly server processing is going to need to bump up in order to take advantage of that speed (so we will probably see it within blade enclosures first).
NFS: Density and Scalability
Because NFS is not limited by the same issues of block storage, it is ideal for large enterprise content repositories and other significantly-sized datastores:
A key element of VMFS is the SCSI architecture that includes a command queue limit or a limit to the number of commands that can be addressed simultaneously by the LUN. In general this LUN and HBA queue limit is the limit to VMFS scaling. While VMFS VM to Datastore density can in theory match NFS from a VM per datastore scale, it requires advanced configurations that allow increased LUN queues.
Additional Kernel Settings
Of course, you should follow the same general Ethernet best practices as for iSCSI: dedicated VLAN or switches, consider using Jumbo Frames for 1Gbps networks, etc.
As Cormac Horgan, Storage Architect for VMware on the R&D side, notes, changing the default values for most advanced NFS settings is not generally recommended. However, there are a few settings related to ESXi memory that should be changed when adding more than the default maximum of eight NFS datastores. Make sure you perform this across all hostts—use host profiles!
~ # esxcfg-advcfg -s 32 /Net/TcpipHeapSize
These settings will ensure that as you increase the number of datastores towards the max of 256, you will have enough memory reserved in ESXi to handle the increased network heap requirements.
Guest OS Considerations
One last thing about NFS. You may want to consider changing one registry setting in the guest operating system. Because NFS is more tolerant of latency, the NFS datastore timeout operations are higher by default:
However, most of our guest operating systems will have timed out by then—doh! The default windows timeout for disk operations is 60 seconds, which is set with the windows registry at HKLM|System|CurrentControlSet|Services|Disk using the key TimeOutValue. You will want to consider editing this (requires a reboot) to something equal to or greater than 125, in order for your VMs to not timeout, that is, if latency is an issue in your network.
NetApp has a Guest OS toolkit as part of the Virtual Storage Console (VSC)—which by the way can also configure all of the above), one changes the setting to 190, the other reverts it back to default:
Continuing my ongoing recap of my recent vSphere 5.5 technical deep-dive, I now shift to Best Practices. This is installment nine in this section of the series. To view all the posts in this series, click Geeknicks in the categories list.
Best Practice #4: Enable Storage I/O Control
Storage I/O Control or SIOC is a mechanism that is extremely useful in today's unpredictable workloads. If you have ever run a terminal server or Citrix server you know of what I write: you have one user doing something in, say, Microsoft Publisher (notorious for this), and it brings everyone else on that server to a halt because it is consuming all the resources. This phenomenon in computing terms is called the "noisy neighbor."
What is Storage I/O Control and Why Should I Use It?
Storage is especially sensitive to the noisy neighbor. Unlike the scenario above, and in light of today increasingly larger datastores, the noisy neighbor could impact dozens of machines across a good many hosts—maybe even more, depending on how your underlying storage array is setup.
In official technical terms, VMware vSphere® Storage I/O Control is used to
provide I/O prioritization for virtual machines running on a group of VMware vSphere® hosts that have access to a shared storage pool. . . . It increases administrator productivity by reducing active performance management. Storage I/O Control can trigger device-latency monitoring that hosts observe when communicating with that datastore. When latency exceeds a set threshold, the feature engages to relieve congestion. Each virtual machine that accesses that datastore is then allocated I/O resources in proportion to their shares. — VMware
The great thing about SIOC is that it is datastore wide—it is only configured once per datastore, and from then on it monitors everything for you. It is really as close to a "set it and forget it" setting as you can have in VMware.
I have always found this older image, courtesy of the annals of the Internet, to be particularly helpful in illustrating to customers the benefits of SIOC. Depicted in it, you have your four VMs running. The fifth VM is your "noisy neighbor," the resource hog. Notice what happens in the purple without SIOC: the noisy neighbor brings them all to a crawl (relatively speaking) by introducing contention, which in turn produces latency—and all suffer together.
But with SIOC turned on, only the noisy neighbor suffers, as it should be. If you have resource hog VMs in your environment, strongly consider using SIOC.
What Else Should I Know?
Well, several things you should know if you are running SIOC. First, the differences in 5.5 and 5.1.
For vSphere 5.1, SIOC defaults to a 30ms average latency threshold before limiting IO on the noisy neighbor. For some shops, 20ms is the standard and you will want to adjust this.
But with all the differing performance characteristics of disks and disk tiers these days, it is much more difficult to deterministically set a threshold. As such, the process has been dynamicized (I just made that up) in 5.5:
The latency threshold is set to the value determined by the I/O injector (a part of Storage I/O Control). When the I/O injector calculates the peak throughput, it then finds the 90 percent throughput value and measures the latency at that point to determine the threshold. vSphere administrators can change this set throughput value to another percent value or they can continue to input a millisecond value. — VMware
For me, this is a huge reason to upgrade to 5.5, because each of my datastores may have differing characteristics, and it is much easier (and probably a bit more accurate) to allow the VMkernel to figure out what the peak throughput is and then start monitoring for latency accordingly.
As part of this new feature, vSphere 5.5 also introduced a new metric to provide a more guestOS-like measurement of this auto-calculated threshold, and it's called VM Observed Latency.
Before vSphere 5.5, latency was only measured inside the host, but now is it measured more dynamically. How exactly? Frank Denneman has a great article on that. Suffice it to say that the old metrics in ESXTOP—GAVG, DAVG, and KAVG—are no longer related to the SIOC threshold (though they can still provide valuable troubleshooting insight within the host. Since SIOC is datastore-wide, it needs a datastore-wide kind of metric.
In essence, you could say that SIOC does some fancy calculations based on a normalized average of latency, IOPS, and IO request size, so that it intelligently understands the variety of workloads and IO operations—not all IO ops are created equal, after all.
No, SIOC Is Not Cluster-Based
I have literally been saying this for years, since SIOC came out in vSphere 4.1. Because it is a datastore-wide metric, it is not limited to only the hosts in a certain cluster. In fact, as you will see below, each host interacts with SIOC on its own. This means you can have datastores connected to multiple clusters, multiple standalone hosts—whatever—and it will have no affect on its performance; it will continue to do its job.
A common misconception about SIOC is that it’s compute cluster based. The process of determining the datastore-wide average latency really reveals the key denominator – hosts connected to the datastore – (sic). All hosts connected to the datastore write to the IORMSTATS.SF file, regardless of cluster membership. Other than enabling SIOC, vCenter is not necessary for normal operations. Each connected host reads the IORMSTATS.SF file each 4 seconds and locally computes the datastore-wide average to use for managing the I/O stream. Therefor cluster membership is irrelevant.
I Don't Have an Options for SIOC—What Gives?
The first time I ran across this, I wondered myself. But a quick bit of research showed that while the system I was working on at the time (4.1) supported SIOC for block, it did not support it for NFS (which I wanted. The requirements are straightforward:
For block storage you would need to use vSphere version 4.1 or later. For NAS storage, you would need to use vSphere version 5.0 or later. — Debunking the Myths of Storage I/O Control
So How Do I Enable Storage I/O Control?
It's quite easy really. Remember this most important thing, however: all new features in 5.5 and forward are only available in the web client.
So navigate to Home | Storage and select the datastore for which you would like to enable SIOC. Then click Manage | Settings and select the Edit button in the Storage I/O Control section, and you will be presented with the options as shown above (default) and below (SIOC enabled).
That's it! Happy storage controlling!