(I had to mangle URLs a bit in the report below, to fool the antispam filter)
KVM versus VirtualBox raw disk I/O performance benchmark, January 2012 ====================================================================== This is a test report of storage speed of KVM (qemu-kvm) and VirtualBox using the following versions: qemu-kvm version 1.0 from git://git.kernel. org/pub/scm/virt/kvm/qemu-kvm.git VirtualBox version 4.1.8r75467 installed using a .sh installer from htt ps ://www .virtualbox. org/wiki/Linux_Downloads (qemu-kvm was compiled using default CFLAGS, eg. no specific optimizations) Host kernel used was (vanilla) Linux 3.2.0 with custom config, unrelated to this benchmark. Linux distribution used was Debian GNU/Linux 6.0.3 running on CPU: Intel i5-2500K, Sandy Bridge RAM: 2x 4GB Kingston HyperX DDR3 CL9 (KHX1333C9D3B1) = 8GB dual channel other hardware is unrelated The actual 4 GiB testing drive was placed into tmpfs, therefore all benchmark-related files / disks were stored in host's RAM. The host had no configured swap space. Environment used for running virtual machines was Xorg 1.7.7, fluxbox 1.1.1, both from default Debian 6.0 repository. The host itself was not doing anything else at the time of this test (eg. no additional load, pure X+fluxbox, most daemons / kernel modules disabled). All tools were run as root on the host. VM guest used was intended to be a minimal custom configuration of OpenWrt, composed using their "Image Generator" (10.03.1, "x86_generic"), however virtio support is not available for that particular image, therefore "SystemRescueCD" live distribution (version 2.4.1) was chosen. This live distribution was custom-modified to allow serial console interaction, boot "altker32" (3.1.5) kernel by default and start in single-user mode (runlevel 1) to minimize memory usage. Benchmark background ==================== The goal of this benchmark was to measure both sequential and random I/O speeds of both qemu-kvm and VirtualBox and compare them. Technologies in question were IDE, AHCI, SCSI and virtio. Shared Folders (vbox) and 9p virtfs (KVM) were not tested (due to sysrescd kernel headers unavailability). The specific goal behind this benchmark was to find out whether VirtualBox implementation of storage backends can match the performance of virtio-blk used by qemu-kvm (KVM). Testing specifics ================= Serial console interaction was used for actual testing, to avoid SDL/graphic overhead and simulate a generic server application. The guest was tuned for maximum benchmark accuracy, that is - startup with 100M of RAM (to allow initrd extraction), then fill most of the space using tmpfs to minimize guest-side caching during the benchmark process: mkdir fill/ mount -o size=100% -t tmpfs tmpfs fill/ dd if=/dev/zero of=fill/ramfill bs=1M count=50 (which leaves about 22MB "free", enough for "dd" with 16MB blocks) The actual sysrescd (guest OS) ISO file was placed into tmpfs to eliminate HDD I/O during command/lib load inside the guest. An IDE interface was chosen for the guest cdrom drive. General benchmark setup ======================= For both sequential and random I/O benchmarks, a 4 GiB zero-filled file was created in tmpfs using: dd if=/dev/zero of=/tmp/bigfile bs=1G count=4 losetup /dev/loop1 /tmp/bigfile (the loop overhead is there because VirtualBox is unable to use raw files) For direct disk access from VirtuaBox, a disk.vmdk file was created using: VBoxManage internalcommands createrawvmdk \ -filename disk.vmdk -rawdisk /dev/loop1 Both qemu-kvm and VirtualBox were optimized (using cmdline flags / settings) for maximum raw performance as much as possible (ie. "IO APIC" in vbox disabled, as a hint there suggests performance hit), any guest-side addons / additions were NOT INSTALLED, since they do not help raw disk I/O performance. Please note that "Use host I/O cache" was disabled, as well as "cache=none" qemu-kvm version of it, for the sake of accuracy. As for guest parameters, 100MB of RAM was already mentioned, note that this is "100MB" as reported by VirtualBox GUI, which is 93057024 bytes reported by the guest, with qemu-kvm and -m 100M it's 93110272 bytes (about 500KB diff). Only 1 CPU core was used for the guest. Arguments used for qemu-kvm were passed in the "old" (non -device) form, similar to: qemu-kvm -enable-kvm -nographic -boot order=d,menu=off \ -m 100M -serial pty \ -cdrom /tmp/myrescd.iso \ -drive if=ide,media=disk,cache=none,aio=native,file=/dev/loop1 And as mentioned earlier, both virtualization tools were run without graphical output, that is -nographic for qemu-kvm and vboxheadless for VBox. Specific benchmarking tools / ways ================================== For the actual timing, "time" command was used and its "user" time noted. For sequential I/O, "dd" tool was used as dd if=/dev/sda of=/dev/null bs=16M dd if=/dev/zero of=/dev/sda bs=16M where "/dev/sda" has 4294967296 bytes (4 GiB). Returned "user" time is the total time of reading / writing 4 GiB of data, and from that, the hand-calculated speed of "MBs per second" was noted. For random I/O, a custom small C program was used. This tool basically does 1) select random LBA sector from 0,MAX 2) seek to it 3) read (or write) $blocksize bytes 4) goto 1 until it's interrupted by pre-set alarm(), which defaults to 60 seconds. Two values were reported; total number of blocks read and an average of "blocks read per second" - only the second value was noted. (MAX = (4 GiB * 1024^3 / 512)-1) Each test was performed 5 times, doing echo 3 > /proc/sys/vm/drop_caches before each pass (although that should have minimal impact due to guest memory being already saturated thanks to "ramfill" (see above)). The final value was calculated as the arithmetic mean of those 5 values. (note: "sync" was not needed as the guest had almost no RAM for writeback) The above echo was executed on the host as well, just before each VM startup, the VM was re-started for EACH individual series of 5 passes (eg. for each test). Raw benchmark results ===================== === SEQUENTIAL I/O === reported READ speeds (in MiB / second) - first part with average CPU usage on host, all 4 cores (percentage) - second part: | IDE | SCSI | VIRTIO | "SATA" | "SAS" ------------------------------------------------------------- qemu-kvm | 766.52 | 893.78 | 1097.01 | - | - VirtualBox | 547.01 | 773.76 | - | 681.48 | 783.92 ------------------------------------------------------------- qemu-kvm | 27 | 27 | 28 | - | - VirtualBox | 33 | 35 | - | 30 | 34 ------------------------------------------------------------- reported WRITE speeds (in MiB / second) - first part with average CPU usage on host, all 4 cores (percentage) - second part: | IDE | SCSI | VIRTIO | "SATA" | "SAS" ------------------------------------------------------------- qemu-kvm | 700.46 | 818.58 | 696.34 | - | - VirtualBox | 458.27 | 556.40 | - | 574.86 | 554.55 ------------------------------------------------------------- qemu-kvm | 26 | 29 | 19 | - | - VirtualBox | 32 | 29 | - | 29 | 29 ------------------------------------------------------------- === RANDOM I/O === reported READ seek speeds (in seeks / second) - first part with average CPU usage on host, all 4 cores (percentage) - second part: | IDE | SCSI | VIRTIO | "SATA" | "SAS" ------------------------------------------------------------- qemu-kvm | 8015.5 | 14608.3 | 26896.7 | - | - VirtualBox | 6722.9 | 12730.8 | - | 11313.4 | 12692.0 ------------------------------------------------------------- qemu-kvm | 27 | 31 | 31 | - | - VirtualBox | 29 | 31 | - | 30 | 31 ------------------------------------------------------------- reported WRITE seek speeds (in seeks / second) - first part with average CPU usage on host, all 4 cores (percentage) - second part: | IDE | SCSI | VIRTIO | "SATA" | "SAS" ------------------------------------------------------------- qemu-kvm | 3029.9 | 5478.9 | 10220.2 | - | - VirtualBox | 2616.6 | 7330.0 | - | 4450.9 | 7285.9 ------------------------------------------------------------- qemu-kvm | 27 | 31 | 33 | - | - VirtualBox | 28 | 30 | - | 29 | 30 ------------------------------------------------------------- Benchmark summary, conclusion ============================= In all those benchmarks, "real" time was measured instead of "sys" because the benchmark was intended to express the actual VM speed, including CPU time consumed by interface emulation / translation. CPU usage on the host was mostly "1 core 100%" (25% total), with additional load caused most likely by emulation. Deviations / differences between each of the 5 runs weren't really huge, about 5-15MB/s sequential I/O and 30-80 seeks/second random I/O. Significant write seek time reduction might be caused by tmpfs, which seems to use 4K blocks for files and thus needs to do a read-modify-write cycle for each write(). As for conclusion, this benchmarker is not going to say "qemu-kvm is overall faster than VirtualBox", although it would be in fact true. He's rather going to say "VirtualBox could benefit from virtio". It still all boils down to specific usecases - VirtualBox is somewhat slower in raw I/O performance, however it can make it up in other areas - be that guest-additions, 2D/3D acceleration or something else. Furthermore, you won't notice the performance difference on a slow magnetic drive. This benchmark doesn't claim to be 100% accurate, however its author configured both virtualization tools to his best knowledge. comps