CPU performance issues in guest OS
CPU performance issues in guest OS
I'm having some performance issues with a Linux VM (nixos) running on a Solaris 11.4 host. I borrowed this format from another post that I found helpfully organized, thanks!
PC: Dell PowerEdge R530
CPU: Intel Xeon E5-2650 v4 @ 2.20GHz (12 physical cores, 24 threads)
MEM: 16GB
Virtualbox version: 6.1.34 r150636 solaris.amd64
CPUs allocated to VM: 1
MEM allocated to VM: 0.5GB
Host OS: Solaris 11.4.42.111.0 (latest CBE as of today), "SunOS eros 5.11 11.4.42.111.0 i86pc i386 i86pc non-virtualized"
Guest OS: nixos minimal live CD iso (nixos-minimal-21.11.337422.3c5ae9be1f1-x86_64-linux.iso)
The symptoms are:
* The VM seemed slow when I changed the host system hardware from an old i7-920 to the Xeon E5-2650v4. Not slow enough to be useless, but much slower than I expected.
* Simple benchmark: "time dd if=/dev/zero of=/dev/null bs=1M count=10000" takes ~1 second on the host OS and takes ~16 seconds on the guest. I don't expect virtualization overhead to reduce performance by 16x.
* Other reference points for that same benchmark: 0.25s on my Windows 10 i9-9900K (under cygwin) and 0.5s in the same nixos-minimal guest image running as a VirtualBox guest on that (Windows 10) host.
* I tried a bit more CPU intensive benchmark: "time dd if=/dev/zero bs=1048576 count=1000 | sha256sum" which is 17 seconds on the (Xeon solaris) host and 24 seconds on the (nixos-minimal) guest. That's actually much better overhead! But wait, on my i9-9900K under Windows/cygwin the host OS takes 3.5 seconds and the nixos-minimal guest takes only 2.8s! So the virtualization overhead is negative?? Nope, the implementations are very different! And that's true between solaris and linux too. Basically I need to use a better benchmark, something that's more identical/repeatable across OS platforms. But there's no doubt, the guest VM is just sluggish on the Xeon while the host OS is snappy, and that same nixos-minimal guest is super snappy under Virtualbox on my Windows PC. So something's up.
Solutions I've tried:
* Setting the chipset to PIIX3 or ICH9. This change had no effect.
* Modifying the number of CPUs allocated to the VM to 1, 2, 3, 4, 6, 8, 10. No setting had much effect on the benchmark.
* Enabling and disabling PAE/NX, Nested VT-x/AMD-V, and Nested Paging. These changes had no effect.
* Updating all OS versions, VirtualBox version, etc. to the latest.
* Rebooting the host PC.
* Disabling all extra stuff USB, sound card, storage controllers (besides the live CD), network devices, etc.
Any suggestions would be appreciated. VBox.log is attached (zipped) from a "bad" run on the solaris/Xeon host.
PC: Dell PowerEdge R530
CPU: Intel Xeon E5-2650 v4 @ 2.20GHz (12 physical cores, 24 threads)
MEM: 16GB
Virtualbox version: 6.1.34 r150636 solaris.amd64
CPUs allocated to VM: 1
MEM allocated to VM: 0.5GB
Host OS: Solaris 11.4.42.111.0 (latest CBE as of today), "SunOS eros 5.11 11.4.42.111.0 i86pc i386 i86pc non-virtualized"
Guest OS: nixos minimal live CD iso (nixos-minimal-21.11.337422.3c5ae9be1f1-x86_64-linux.iso)
The symptoms are:
* The VM seemed slow when I changed the host system hardware from an old i7-920 to the Xeon E5-2650v4. Not slow enough to be useless, but much slower than I expected.
* Simple benchmark: "time dd if=/dev/zero of=/dev/null bs=1M count=10000" takes ~1 second on the host OS and takes ~16 seconds on the guest. I don't expect virtualization overhead to reduce performance by 16x.
* Other reference points for that same benchmark: 0.25s on my Windows 10 i9-9900K (under cygwin) and 0.5s in the same nixos-minimal guest image running as a VirtualBox guest on that (Windows 10) host.
* I tried a bit more CPU intensive benchmark: "time dd if=/dev/zero bs=1048576 count=1000 | sha256sum" which is 17 seconds on the (Xeon solaris) host and 24 seconds on the (nixos-minimal) guest. That's actually much better overhead! But wait, on my i9-9900K under Windows/cygwin the host OS takes 3.5 seconds and the nixos-minimal guest takes only 2.8s! So the virtualization overhead is negative?? Nope, the implementations are very different! And that's true between solaris and linux too. Basically I need to use a better benchmark, something that's more identical/repeatable across OS platforms. But there's no doubt, the guest VM is just sluggish on the Xeon while the host OS is snappy, and that same nixos-minimal guest is super snappy under Virtualbox on my Windows PC. So something's up.
Solutions I've tried:
* Setting the chipset to PIIX3 or ICH9. This change had no effect.
* Modifying the number of CPUs allocated to the VM to 1, 2, 3, 4, 6, 8, 10. No setting had much effect on the benchmark.
* Enabling and disabling PAE/NX, Nested VT-x/AMD-V, and Nested Paging. These changes had no effect.
* Updating all OS versions, VirtualBox version, etc. to the latest.
* Rebooting the host PC.
* Disabling all extra stuff USB, sound card, storage controllers (besides the live CD), network devices, etc.
Any suggestions would be appreciated. VBox.log is attached (zipped) from a "bad" run on the solaris/Xeon host.
- Attachments
-
- VBox.zip
- (26.72 KiB) Downloaded 15 times
-
- Site Moderator
- Posts: 39134
- Joined: 4. Sep 2008, 17:09
- Primary OS: MS Windows 10
- VBox Version: PUEL
- Guest OSses: Mostly XP
Re: CPU performance issues in guest OS
I would give the guest OS a second CPU core and increase graphics RAM to 128MB.
I was also going to suggest enabling 3D graphics acceleration, but I spotted this in the log:
I was also going to suggest enabling 3D graphics acceleration, but I spotted this in the log:
So I guess that's out. You'll need to restrict VM display sizes.00:00:07.425787 VMSVGA3d not available in this build!
Re: CPU performance issues in guest OS
Thanks, I did try many settings for core count but I haven't played with graphics RAM. Let me try that. BTW the guest OS just boots to a text terminal, no GUI/graphics mode. It's a "minimal" Linux live CD, but the performance is representative of my other production system guest VMs and it's a nice contained test case.mr_spock wrote: * Modifying the number of CPUs allocated to the VM to 1, 2, 3, 4, 6, 8, 10. No setting had much effect on the benchmark.
Re: CPU performance issues in guest OS
I think I figured it out! The graphics memory increase to 128MB had no effect. But this did:
Now the guest OS is only ~2x slower than the host OS which is very acceptable. I'm now sweeping across the CPU core counts to see how that affects it.
Code: Select all
VBoxManage modifyvm speedtest1 --spec-ctrl on
Re: CPU performance issues in guest OS
For anyone wondering about my processor core count results, I found the single-thread performance of the guest to be highly stable with a single allocated vCPU, and increasing amounts of fluctuation/variability as the core count increased. If the only important metric was overall throughput it would be fine but the short-term unpredictability is pretty bad. I'm now considering splitting my single VM instance that hosts multiple services into multiple independent VMs each with a single service. Some of those services do benefit from additional cores but I can probably keep each one to 2x vCPU and the ones that are truly single threaded will get nice boosts from running in a single vCPU instance. Ugh, more workarounds. (I'd then need to maintain multiple OS instances instead of just one, etc...)
Re: CPU performance issues in guest OS
More findings for others who might happen across this post. I wanted more VM performance so I went looking for how to tweak the Spectre/Meltdown mitigations in the host OS. Found the `sxadm` tool and went from this:
To this:
Now the guest OS runs the simple benchmark at nearly the same speed as the host (1.0 seconds host and 1.6 seconds guest) independently of core count. The high performance variability I was getting beyond 2 vCPUs is gone. Also toggling "VBoxManage --spec-ctrl on/off" now has no effect.
Code: Select all
$ sxadm status
EXTENSION STATUS FLAGS
aslr enabled (tagged-files) u-c--
ibpb enabled -kcr-
ibrs enabled -kcr-
if_pschange_mc_no not supported -----
kpti enabled -kcr-
l1df enabled -kcr-
md_clear enabled -kcr-
mds_no not supported -----
nxheap enabled (tagged-files) u-c--
nxstack enabled (all) u-c--
rdcl_no not supported -----
rsbs enabled -kcr-
smap enabled -kcr-
ssbd enabled (tagged-files) u-c--
taa_no not supported -----
tsx_disable not supported -----
umip not supported -----
Code: Select all
$ sxadm status
EXTENSION STATUS FLAGS
aslr enabled (tagged-files) u-c--
ibpb disabled -kcr-
ibrs disabled -kcr-
if_pschange_mc_no not supported -----
kpti disabled -kcr-
l1df disabled -kcr-
md_clear disabled -kcr-
mds_no not supported -----
nxheap enabled (tagged-files) u-c--
nxstack enabled (all) u-c--
rdcl_no not supported -----
rsbs disabled -kcr-
smap enabled -kcr-
ssbd disabled u-c--
taa_no not supported -----
tsx_disable not supported -----
umip not supported -----
-
- Site Moderator
- Posts: 20945
- Joined: 30. Dec 2009, 20:14
- Primary OS: MS Windows 10
- VBox Version: PUEL
- Guest OSses: Windows, Linux
Re: CPU performance issues in guest OS
Thanks for thie info! To clarify, did you turn off the spectre & meltdown mitigation on the host OS?
-
- Posts: 231
- Joined: 1. Jan 2017, 09:16
- Primary OS: MS Windows 7
- VBox Version: PUEL
- Guest OSses: Ubuntu 16.04 x64, W7
Re: CPU performance issues in guest OS
I think we can all agree that's half right, but unfortunately it's the first half.mr_spock wrote:* Simple benchmark: "time dd if=/dev/zero of=/dev/null bs=1M count=10000"
It's especially not a CPU benchmark, at all, in any way. In fact, short of ping there isn't much you could have used that would be LESS indicative.
If your concern is genuinely CPU performance, and you just accidentally used the wrong term, then when you do actual CPU benchmarks you should generally see ~95% of host performance. If that isn't the case, *then* let us know.
If you don't know of any CPU benchmarks, start with
Code: Select all
openssl speed -evp aes-256-cbc -elapsed
Charitable of you though it may be, 50% is nowhere *near* "very acceptable" - but since you were only measuring file ops and syscall throughput, you happened to stumble into one of the areas *all* VMs "have to" run poorly in. Even then though, yours seem especially poor. Still, one thing at a time.
Re: CPU performance issues in guest OS
Thanks @arQon, I'm aware how poor that benchmark was. (At least the sha256sum variant was slightly better.) But trust me, these VMs running on this Broadwell/Xeon host felt *super* sluggish until I made the changes I documented (i.e. disabling protection from Spectre/Meltdown).
Thanks for the openssl command line, I tried it and here are the results:
Host OS (Solaris):
Guest OS (nixos linux):
So now the guest is roughly 93% as fast as the host. But this is AFTER all those tweaks I made and I don't want to take the host OS down for reboot twice more to test it in the original configuration. I'm satisfied with the performance of the system now -- even the "butt in seat" feel of the guest VM is very good now. You should have SEEN how sluggish a Ubuntu live CD was before! Now it's just fine. I think you are right, it's the syscall/kernel performance that I was having trouble with, not "CPU." Makes sense because those context switches are what Spectre/Meltdown mitigations were affecting the most.
Thanks for the openssl command line, I tried it and here are the results:
Host OS (Solaris):
Code: Select all
$ openssl speed -evp aes-256-cbc -elapsed
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 33763631 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 8902589 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 2256864 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 566102 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 70825 aes-256-cbc's in 3.00s
OpenSSL 1.0.2za 24 Aug 2021
built on: date not available
options:bn(64,64) rc4(16x,int) des(ptr,cisc,16,int) aes(partial) blowfish(ptr)
compiler: information not available
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 180072.70k 189921.90k 192585.73k 193229.48k 193399.47k
Code: Select all
$ openssl speed -evp aes-256-cbc -elapsed
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 31790173 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 7816610 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 2118291 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 530860 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 66334 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 16384 size blocks: 33095 aes-256-cbc's in 3.01s
OpenSSL 1.1.1k 25 Mar 2021
built on: Thu Mar 25 13:28:38 2021 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr)
compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-256-cbc 169547.59k 166754.35k 180760.83k 181200.21k 181136.04k 180142.35k