I’m currently working on an eBook “Storage Basics for vSphere”. As it’s nearly finished, I thought I’d put up a couple of extracts over the next week or two prior to release.
This part covers the basics of benchmarking storage using IOMeter, which can be downloaded from here. Please do post a comment or rate this post (at the bottom).
Benchmarking Storage with IOMeter
Benchmarking storage can be more complicated than it first seems. Using a consistent approach, benchmarking is a useful tool for system sizing in the first instance, by measuring existing storage that is known to be stretched, but also to confirm that new implementations are performing as expected and, more importantly, as required.
Performance Metrics
Benchmarking storage for virtualised environments involves three key measures,
- Sequential MB/s, both read and write. This usage pattern is significant for vmdk based backups and for some file server applications making use of large files.
- 8K Random IOPS(IOs per second), primarily mixed read and write but also individually for read and write. These metrics give a good relative indication of the performance of storage for virtualised environments, since competing workloads create random patterns.
- Latency in ms. Latencies must be understood since there is likely to be a point where increasing the throughput results in significant latency that will slow down user response times beyond acceptable limits. The queue depth (that drives latency) that a storage system can support also ultimately determines the number of competing workloads that can be accommodated.
Performance Tools
The most common tools for assessing system performance are dd, HDTune and IOMeter. IOMeter is a favourite since it’s free, easy to use, and highly configurable. It runs primarily on Windows (Server 2003 or 2008).
Once downloaded and installed (all in less than a minute), the IOMeter interface is simple and clean (note that it needs to be run as administrator on 2008):
How and Where to Test
Benchmarking involves performing the key tests iteratively with varying queue depths – allowing for ramp and run times, this is a time consuming process.
Queue depth determines how much IO the OS (or application) will ‘allow’ the underlying controller to optimise. By re-ordering and combining commands, physical IO can be streamlined or reduced by the controller. The more commands the controller can process in this way before needing to report something back to the OS, the better the ’hit-rate’ of such optimisations (and hence the higher the ultimate throughput), but the trade-off is latency, which can have a devastating impact on the user experience.
Generally,
- Benchmark within a guest VM. This is the only way to properly understand the storage performance for the virtual environment. Wherever possible, do so using a host that is running only the guest being used for benchmarking, since CPU scheduling delays can result in overstatement of the results due to timing inaccuracies.
- The test file size (‘sectors’ field, the area used for testing by IOMeter) must be large enough to avoid any significant percentage of controller cache hits, unless of course cached performance is being tested to determine network latency for example. A file size of 4 or 8GB is usually sufficient for local storage, and perhaps as much as 30GB when testing NAS devices with GB’s of cache.
Each test needs to be repeated with various queue depths to find the maximum throughput at an acceptable latency, a limit of 50ms sometimes being cited.
The test size is defined in IOMeter on the front sheet in sectors (8GB being 16,777,216 sectors) along with the queue depth for the test (see picture above).
Most tests should be performed using a ‘ramp’ time, especially for write tests, being long enough to fill the caches so that the results show sustained throughput. A one minute ramp and five minute duration will usually give fairly accurate results. This is set on the ‘Test Setup’,
Defining the Workloads
With the basic settings configured the workloads need to be defined. IOMeter includes many predefined workloads, but three core tests suffice, which only take a minute to configure:
- 32K Sequential read – 32K, 100% sequential, 100% read
- 32K Sequential write – 32K, 100% sequential, 0% read
- 8K Real Life – 8K, 100% random, 70% read
When testing NFS volumes or any storage that isn’t 512-byte sector addressable (such as the latest SATA disks), the alignment must be set to 4K in these test definitions (4K can be used to test any storage since the file systems will be working in clusters of at least 4K). The tests are configured from the ‘Access Specifications’ sheet. Creating a test definition is straightforward,
Once defined, add one test (only) to the left hand pane on the Access Specifications sheet.
Setting it Running
With everything configured, click on the green flag and IOMeter will start the test, first writing out a massive test file of the size specified which will then be used for the test. When that is written and the ramp time complete, the results sheet gives a real-time view of the storage (drag the slider left to see the results as it’s running):
Interpreting the Results
When testing networked storage with sequential workloads, often the network is the bottleneck and performance will approach n*110 MB/s (with gigabit Ethernet), where n is the number of active load-balanced paths. Since local storage does not have this restriction, it can be significantly faster in these tests.
Sequential metrics are frequently cited, giving seemingly massive numbers that leave no insight as to why a server is suffering storage related performance problems. As said already, random IOPS are generally much more important for virtualisation, but sequential metrics are often a good indication of the overall ‘health’ of the storage subsystem:
- Any kind of network latency, congestion or packet loss will reduce sequential throughput, potentially significantly.
- Storage subsystem misconfiguration, such as incorrect cache policies, will have a great impact on write performance.
Although there are far too many permutations to list ‘typical’ values, some measured values are given below.
- Single SATA Drive: 60-90 MB/s
- 4x SATA RAID-10: 120-150 MB/s
- 5x SAS 10k RAID-5: 300+ MB/s
For the more important random workload IOPS, problems here will be indicative of different issues,
- The impact of queue depth can clearly be measured through the average latency.
- Alignment issues, particularly with NFS storage, will result in greatly reduced random write throughput.
Some values measured with the 8K program mentioned above are,
- Single SATA Drive: 140 IOPS
- 4x SATA RAID-10: 490 IOPS
- 5x SAS 10k RAID-5: 1,100 IOPS
About the Forthcoming eBook Storage Basics for vSphere
Whilst virtualisation offers major advantages, for the SME there is a general lack of accessible quality guides to storage, and a wealth of ‘consultants’ all too eager to propose massively over-specified solutions.
The book aims to demystify the storage options for vmware and help the SME administrator avoid both consultant margins and performance and availability surprises later down the line.




[...] IOMeter on a Dell R610 (with 1 vCPU and 4GB RAM allocated) using the methods described on my blog, here. For comparison, I’ve also run the tests on a few different storage systems, including a [...]
James, what a great write-up. This is exactly what I was looking for. While the Iometer guide is clear in some areas, it is also very ambiguous in others, and your article really helped to clear things up for me. I am very interested reading your e-book. Any time frame for release? Thanks again.
Hi Josh, many thanks for the feedback, I’m glad it was useful. I’ve been wondering what to do with the book and am considering transferring the lot to the wiki, as it is something of a moving target. I’ll have a think and get something out next week though. Cheers, James.
I am also interested in your storage information. I struggle with finding any sort of time to do any sort of benchmarking on the systems I’m quoting for my SMB customers and would love to review any knowledge you can pass along.
James, I’m looking at some results using DAS and I wanted to get your opinion, if you have some time to look at it. I haven’t been able to figure out the alignment and offset for the storage, but using viclient to create the VMFS and server 2008 R2 as the guest, the storage should be aligned properly. The numbers seem to be somewhat inline with what you state above, but one thing that worried me in the 8K tests were the MB/s.
http://i805.photobucket.com/albums/yy337/jcoen/IO_Tests.png
If you have time, let me know what you think, thanks.
Hi Josh, I sent you a mail direct. But your numbers looks OK although you should see higher if you increase the queue depth to at least twice the number of physical disks in the array. Cheers, James.
[...] using my usual methods, this ‘free’ build has sequential read or write performance of 11MB/s and 8K random [...]
Hi James,
Follow your blog and was curious to know if there was any update on your ebook?
TIA,
-Jason
Hi Jason, thanks for posting. I’m sorry for the extended delay on that – it’s just lack of time. I’m hoping to get some time to work on it soon.
Hi,
I’m trying to benchmark our EqualLogic PS4000VX which I’m having performance issues with when using VMDK disks. I have done the sequential test but I’m only getting 17 MB/s from IOMeter, but I know that I can generate at least 70-80 MB/s when copying files inside the VM.
What could I be doing wrong? Is it the 8K test size?
Hi, test sequential with 32K IOs and a queue depth of 32 IOs. Test file size should probably be about 30GB with these units. HTH
[...] (Tech Republic) IOPs? (Yellow Bricks) Storage System Performance Analysis with Iometer (VMware) Benchmarking Storage for VMware (Peacon) Performance Troubleshooting VMware vSphere – Storage (Virtual Insanity) NetApp TR-3808 – [...]
Sounds good, I like to read your blog, just added to my favorites
Hi James
Another great post of yours that google has taken me too! Did you manage to finish that eBook in the end?