NetApp All-Flash FAS for VMware Horizon 6
VMworld is a time for big announcements. Not just by VMware, either, but by many major vendor partners. NetApp, being a Platinum sponsor since the inception of VMworld, has often used it (in conjunction with their own Insight conference) to launch new products, configurations, updates, and the like. And VMworld 2014 was no exception.
NetApp announced not only the All-Flash FAS series of arrays, but also support (and benchmarks) for VMware's next-generation virtual desktop infrastructure product, Horizon 6. I have already written a bit about the new features of Horizon 6, but here is a recap of the highlights:
- Cloud Pod Architecture: true active-active data centers with failover and/or load balancing.
But what's just as good, NetApp has an All-Flash FAS offering that can bring about this goodness—from a storage perspective only—for as low as $55 per desktop! And I think they can even go substantially lower than that! (They have a $35/desktop reference architecture as well, but not all-flash).
Unlike the EF Series, which has its own awesome power, DataONTAP allows AFF to scale in a way never before possible. This is one of the key differentiators and why NetApp introduced a second all-flash offering to its portfolio.
The AFF option is available in bundles or custom builds for both the FAS2500 and FAS8000 series controllers and disk shelves.
Proven Reliability and Supportability. Clustered DataONTAP is the single number one storage operating system on the planet, according to Q2 2014 IDC research. With that kind of install base, you can bet it is tried and true beyond belief. In fact, NetApp was awarded best-in-class quality awards for both the Enterprise and Midsize markets, beating EMC and Dell, respectively (source: IDC Worldwide Quarterly Disk Storage Systems Tracker Q2 2014, September 2014). NetApp's AutoSupport aggregated statistics show why:
[Clustered DataONTAP] also delivers greater than 99.999% availability as reported by AutoSupport™, NetApp’s global remote automation and predictive guidance solution which leverages an intelligent data analytics engine to identify at risk (sic) systems. — NetApp
Additionally, NVRAM is used, as it is by many other vendors, to accelerate write acknowledgements to the host initiator and coalesce writes into full stripes, eliminating write fragments and reducing the number of times writes need to be destaged to disk. The net result is a greatly improved write performance. But NetApp does all this in a special way:
No matter how big a write cache is or how it is used, eventually data has to be written to disk. DataONTAP divides its nonvolatile memory into two separate buffers. When one buffer is full, that triggers disk write activity to flush all the cached writes to disk and create a consistency point. Meanwhile, the second buffer continues to collect incoming writes until it is full, and then the process reverts to the first buffer.
This approach to caching writes—in combination with the WAFL® system . . . in
effect turns a set of random writes into sequential writes. — NetApp WP 7193
Now, lest you say that NetApp's RAID-DP, their proprietary implementation of the commonly used RAID 6, slows things down:
NetApp RAID-DP performs so well that it is the default option for NetApp storage systems and has been used regularly for performance benchmark submissions since its release. Tests show that random write performance declines only 2% versus the NetApp RAID 4 implementation. By comparison, another major storage vendor’s RAID 6 random write performance decreases by 33% relative to RAID 5 on the same system [and RAID 5 is a single-parity scheme]. — NetApp: Optimized For...
NetApp mitigates this process in several ways. First, it extends the same spinning-disk WAFL capabilities to All-Flash FAS. Meaning, AFF doesn't employ the program-erase-write procedure in the standard method as do other arrays; it simply writes to another location on the disk, as it does with tradition hard drives. Second, it employs the same NVRAM write coalescing scheme that optimizes traditional FAS arrays. When a write ingresses the controller, the NVRAM acknowledges the write straight away—which, incidentally, allows the guest to move along without waiting for any disk access times. Then, the write is grouped with other previous or subsequent writes and coalesced into 4k-block chunks, and then destaged from NVRAM as a stripe to the disk groups themselves. Additionally, metadata is stored in the same location as file data, resulting in less commits to disk.
By doing all of this, not only is the number of writes themselves reduced dramatically, but the writes occur in such a way that minimizes garbage collection performance penalties, produces more predictable latency, and increases the wear-life of the flash media itself.
Let's sum it up this way: NetApp has been optimizing writes longer than any other storage manufacturer. This is why NetApp can offer a 5-year SSD warranty:
Because of these optimizations, NetApp is able to offer an SSD warranty that is not tied to any wear-leveling limits. — NetApp AFF Deep-Dive
First, what is the difference? Put simple, the controllers and boards are the same, the difference is in the NANA media itself, and how many writes it can accept before it starts to "wear out", that is, degrade in its performance. Tech Target explains it well:
Individual consumer MLC cells can only provide 3,000 to 10,000 write cycles, while enterprise MLC cells can handle 20,000 to 30,000 write cycles. In the enterprise, eMLC can serve as a compromise between inexpensive MLC flash and very expensive single-level cell (SLC) flash. — TechTarget
Pure Storage says I am wrong. In fact, they call me a "perpetrator" of misinformation.
Perpetrators: Competitors who use eMLC or SLC flash in their arrays, typically found in legacy disk arrays which were retrofit with flash instead of being designed from scratch for flash. — Pure Storage
Secondly, Pure is not really being straightforward here. Pure says "Counter-intuitively . . . consumer-grade SSDs actually have a lower annual failure rate than eMLC and SLC SSDs, and can be the building block of a much more reliable flash array." But this is a bit of the marketing machine. Is it true that consumer-grade SSDs have a lower failure rate? Sure. The question is why.
And the answer is math. Consumer-grade SSDs: 1) perform less writes over their average life because consumers write less than a typical storage array; and 2) are retired (i.e., replaced by a new laptop) more often than SAN SSDs.
- 147,147 Peak IOPS @ 5.2GBps throughput
- 8.1s average "Monday Morning" login time @ 0.557ms average latency
This scenario would be the most extreme login workload, in which all 2,000 desktops log in to a storage controller during a 30-minute period and copy 800MB of data for a total of 1.6TB. Even given the extreme workload, the storage latency was under 1ms, and the average CPU utilization was 56%. As these excellent performance numbers and fair login time indicate, more work could still be performed on the storage controller. — NetApp AFF Solution For Nonpersistent Desktops with VMware Horizon
Oh, and by the way, these performance numbers above are for dynamic desktops based off linked clones. Some would say that persistent desktops wouldn't perform as well on the same.
Unfortunately, with today’s prevailing storage technologies (disk or flash) the Persistent Desktop model is very hard to deploy as it places high loads on storage performance and capacity. Storage vendors publish many VDI reference architectures, but often guide toward the “Linked Clone” or non-persistent desktop model, which presents limitations end-users in their desktop flexibility as well as to administrators, who must learn complex new tools and procedures to manage non-persistent desktops. — XtremIO: Hidden VDI Assumptions
According to the Pure Storage Reference Architecture for VMware Horizon, the list price for a VDI solution results in a cost-value of $100/desktop on an FA-320 with 11.1TB of raw SSD storage (MLC).
As tested, the 11TB FlashArray FA-320 delivered best-in-class VDI performance at a cost of $100/desktop for 2,000 desktops. Since the FlashArray was significantly under-utilized throughout the testing on both a capacity and performance basis, the array could have supported 1,000s more desktops, or a smaller array could have been used, either of which would have reduced the $/desktop cost even further. — VMware.com
NetApp comes in with a best value, though. NetApp advertises this system for $55/desktop, Running the pricing myself, list price—the possible highest price you would pay, with zero discount, puts the test configuration at ~$60/desktop. So I would say $55/desktop is very conservative. And for that $55/desktop you get:
- 14.4TB of enterprise-grade SSDs (eMLC) — Best in class (3TB more than Pure)
- Any storage protocol (Pure and XtremIO are limited to FC and iSCSI) — Best in class
- Most tried & true storage operating system on the planet — Best in class
- Available as a Cisco Validated Design — Best in class
- $55 per virtual desktop — Best in class