The trigger for this post was a related topic exchange with darkmikey where he quoted Section 9.9 from the UG:
I made a comment that this was misleading and raised a ticket requesting a change to the documentation. Let’s just say there followed a robust dialogue between Klaus, Frank and myself where we agreed to differ on this point. The substance of these two positions is:With VirtualBox, this type of access is called “raw hard disk access”; it allows a guest operating system to access its virtual hard disk much more quickly than with disk images, since data does not have to pass through two file systems (the one in the guest and the one on the host).
- VB Team: VMDK mapped partitions are recommended, as they generally give significantly better performance compared to the equivalent VDI. Whilst in ideal circumstances performance might be comparable, in general the need to pass data through two file-systems results in a materially reduced performance.
- TerryE: As long as you carry out sensible maintenance of your file-systems and VDIs, the performance of VDIs and raw VMDKs are pretty comparable. The benefits of ease of management of VDIs far outweigh the performance risks of using them compared to raw partitions.
The Benchmark
Since we are discussing I/O performance, I felt it fair to choose a heavy I/O scenario, and I happen to have a good one to hand. I am a system admin for the OpenOffice.org User Forums. I have developed an automated backup process where each night the D/B is dumped and delta’ed giving a Monthly full backup + weekly and daily incrementals. Any new attachments (one of the advantages of phpBB3 is that users can post attachments) are rolled up into a tarball. The sql.bz2 and tar.bz2 files are then automatically shipped off to another site which also acts as a Disaster Recovery (DR) backup for the forums.
So much for production, but for development and maintenance I have a standard Ubuntu JeOS Appliance VM (using VirtualBox, of course), which is based on two VDIs. You install it by running a small XP command script which downloads the System Image from the DR server (the entire LAMP system is a 90Mbyte 7zip archive) and uses VBoxManage to create a test VM with this as /dev/sda and a blank dynamic VDI as /dev/sda. You bootstrap into the VM and run a second script which then interrogates the DR server and downloads all the necessary .tar.bz2 and .sql.bz2 files to do a fully automated build the Apache hierarchy and PostgreSQL database as at the last overnight backup. The data downloads are currently just under 200 Mbytes of bz2 compressed files, and the whole process takes about 45mins from no VM to a LAMP VM running a working copy of last night’s forums.
The biggest chunk of this time is due to my ADSL download bandwidth, and to mitigate this delay, I cache the files and check to see if any file is already in the cache before downloading it. Since the file naming convention embeds the MD5 digest of each bz2 file, I also validate all files against their MD5 before using them. Thus by doing a dry run I can pre-populate the cache (which is on the system partition) and run the database and apache build section against a range of HDD formats. This is heavy I/O: MD5 checking, exploding tar.bz2 archives, integrating sql.bz2 with 2 delta bz2 streams, loading up a 500 Mbyte PostgreSQL database, but due to all the decompression and caching the system does swing from being I/O bound to 100% CPU bound on occasions. Still, the VBox logs tell me that my PC is sustaining about 60 I/Os per second during this process.
The Results
I ran four tests using different configurations of VDI and VMDK. The VDIs were loaded into an NTFS partition (F:) that I use for VDIs (and conventional VMDKs). This is about 60% utilised and I routinely defragment it. The raw VMDKs were logical partitions adjacent to F:. All of these partitions where sized as 1920Mbytes.
- Run 1 — Raw VMDK. 15min 37s.
- Run 2 — Static VDI. 15min 05s.
- Run 3 — Another Raw VMDK. 16min 12s.
- Run 4 — Dynamic VDI. 15min 15s.
So What is my Explanation?
- If you trace through the VB code paths for raw VMDKs and VDIs, these are almost same in terms of hand-offs, indirection, I/O fragmentation and mapping. Both ultimately use the standard C++ stdio read and write functions. The only material difference is that the VDI cases writes to an offset in a VDI file, but the raw VMDK writes to the underlying block device (File \\.\PhysicalDriveN in NTFS).
- The perception that file systems add an overhead is extremely misleading. Yes of course the management of a file system increases the CPU load on this system, but what the user cares about is the responsiveness of the system and that is down to the delays whilst processing must stop when your HDD is moving. The fact that your CPU is at 15% utilisation rather than 12% is irrelevant.
- Developing file systems is extremely complex and the people who do so go to great lengths to ensure that metadata is cached where practical (and for large file structures such as VDIs the cache hit ratio on such metadata is over 99%). You only need to go to disk for cache misses. Also most writes use elevator base delayed write-through algorithms to eliminate largely the response impact of writes.
- When you are I/O bound, it is the pattern and frequency of physical transfers that dominate the composite delay. The VMI uses a 1Mbyte allocation tile which is significantly larger than the underlying file-system cluster size (4Kbytes in the case of both NTFS and Ext3); the double mapping therefore introduces little if any increased fragmentation. Both Ext3 and (defragged) NTFS go to some lengths to maintain local proximity of adjacent clusters. Both user elevator sequencing to further optimise I/O seeks. The net result is that these two file systems largely dovetail and do not ‘fight’. (Incidentally, I think that the main reason that the VMDK times were slower is the two VDIs were stored in the same partition but the VMDK was in an adjacent partition. This increased separation of the sda and sdb regions in the VMI+VMDK case was enough to cause a slight decrease in performance and one which dominated any small overheads of using a VDI.
- I cheated in my allocation of my VDIs because I used the magic sizing formula (NGb – 128Mb) for my disk / partition sizes. This means that the standard VDI preamble is now a whole number of clusters so the Logical and Physical clusters are aligned. (Why the VDI isn’t packed out as standard amazes me: a trivial storage overhead to prevent a source of fragmentation the file system.)
- I also have a consistent and routine housekeeping process for allocating and managing my VDIs which keeps them “lean and mean”. Though this whole issue of how you should manage your VDIs for optimum performance, especially in the case where you need to maintain and to distribute multiple VMs, is not well documented discussed within the VirtualBox community, and the VirtualBox tool support is unnecessarily weak here, but this is really the subject of a separate article.
However, please don't take this strawman post as "gospel" just yet. I would welcome informed feedback and discussion. However, since I have gone to the effort of thinking about it and setting up a real benchmark to support my proposition, I would ask that any counterarguments are also informed by some hard [benchmark] evidence, rather than just opinion.