Multicore performance issue
Posted: 8. Oct 2019, 21:52
As a background, we are in the process of replacing some aging CI build machines with new ones, running in VMs, but have run into some unexpected performance issues that I would like to solve, or at least understand.
The VM host is a dedicated HP Z2 i7-8700 hexacore workstation, running VirtualBox 6.0.12 on Ubuntu 18.04, and the guest OS is Windows 7 Pro 64-bit. The machine compiles code and runs unit tests, and performs very well at both tasks, so I am confident there's nothing fundamentally wrong with the setup. The machine is also used for event-based simulation tests, which consist of multiple processes, communicating over TCP sockets. Due to the event-based nature of the tests, each run is almost perfectly serialized, so to make better use of the hardware, we are running as many tests in parallel as there are numbers of cores on the computer. On the old, physical Win7 boxes this works very well, giving nearly linear speedups. The problem with the new setup is that if more than two cores are allocated to the VM, the performance of these tests absolutely tanks. With more than two cores allocated, we're getting over 100% slower execution times (I don't know the exact slowdown ratio - most tests start failing with execution timeouts). We don't even have run more than two tests in parallel to see the effect, it's enough to just allocate cores to the VM.
Disabling Spectre and Meltdown protections from the host OS and VirtualBox gave a marginal performance improvement, and disabling hyperthreading did nothing.
The big difference between the two workloads (compiling code/running unit tests, and the simulation tests) is that the former are disk- and CPU-intensive, while the latter are network-intensive. My only hypothesis is that this generates a lot of work for the hypervisor, and in the worst case it has to synchronize all the allocated cores.
Any ideas about what's going on, and how we could improve the situation?
The VM host is a dedicated HP Z2 i7-8700 hexacore workstation, running VirtualBox 6.0.12 on Ubuntu 18.04, and the guest OS is Windows 7 Pro 64-bit. The machine compiles code and runs unit tests, and performs very well at both tasks, so I am confident there's nothing fundamentally wrong with the setup. The machine is also used for event-based simulation tests, which consist of multiple processes, communicating over TCP sockets. Due to the event-based nature of the tests, each run is almost perfectly serialized, so to make better use of the hardware, we are running as many tests in parallel as there are numbers of cores on the computer. On the old, physical Win7 boxes this works very well, giving nearly linear speedups. The problem with the new setup is that if more than two cores are allocated to the VM, the performance of these tests absolutely tanks. With more than two cores allocated, we're getting over 100% slower execution times (I don't know the exact slowdown ratio - most tests start failing with execution timeouts). We don't even have run more than two tests in parallel to see the effect, it's enough to just allocate cores to the VM.
Disabling Spectre and Meltdown protections from the host OS and VirtualBox gave a marginal performance improvement, and disabling hyperthreading did nothing.
The big difference between the two workloads (compiling code/running unit tests, and the simulation tests) is that the former are disk- and CPU-intensive, while the latter are network-intensive. My only hypothesis is that this generates a lot of work for the hypervisor, and in the worst case it has to synchronize all the allocated cores.
Any ideas about what's going on, and how we could improve the situation?