Page 1 of 1

Multicore performance issue

Posted: 8. Oct 2019, 21:52
by andersm
As a background, we are in the process of replacing some aging CI build machines with new ones, running in VMs, but have run into some unexpected performance issues that I would like to solve, or at least understand.

The VM host is a dedicated HP Z2 i7-8700 hexacore workstation, running VirtualBox 6.0.12 on Ubuntu 18.04, and the guest OS is Windows 7 Pro 64-bit. The machine compiles code and runs unit tests, and performs very well at both tasks, so I am confident there's nothing fundamentally wrong with the setup. The machine is also used for event-based simulation tests, which consist of multiple processes, communicating over TCP sockets. Due to the event-based nature of the tests, each run is almost perfectly serialized, so to make better use of the hardware, we are running as many tests in parallel as there are numbers of cores on the computer. On the old, physical Win7 boxes this works very well, giving nearly linear speedups. The problem with the new setup is that if more than two cores are allocated to the VM, the performance of these tests absolutely tanks. With more than two cores allocated, we're getting over 100% slower execution times (I don't know the exact slowdown ratio - most tests start failing with execution timeouts). We don't even have run more than two tests in parallel to see the effect, it's enough to just allocate cores to the VM.

Disabling Spectre and Meltdown protections from the host OS and VirtualBox gave a marginal performance improvement, and disabling hyperthreading did nothing.

The big difference between the two workloads (compiling code/running unit tests, and the simulation tests) is that the former are disk- and CPU-intensive, while the latter are network-intensive. My only hypothesis is that this generates a lot of work for the hypervisor, and in the worst case it has to synchronize all the allocated cores.

Any ideas about what's going on, and how we could improve the situation?

Re: Multicore performance issue

Posted: 8. Oct 2019, 22:35
by Martin
Your system only has six physical CPU cores. You have allocated six virtual CPUs to your guest.
So while your guest is running (has an active time slice) the host has no core left to handle the i/o from your guest.
Please try lowering the number of CPUs of your guest to 4 or 5.

Re: Multicore performance issue

Posted: 8. Oct 2019, 23:08
by andersm
Martin wrote:Your system only has six physical CPU cores. You have allocated six virtual CPUs to your guest.
As I wrote in the original post, the issue happens when more than two cores are allocated. Also, if it was something that simple it would also show in the other workloads, which it doesn't.

Re: Multicore performance issue

Posted: 16. Oct 2019, 08:24
by BillG
I guess your tests are confirming a well-known fact - some tasks are well-suited to virtualization and some are not. That is why you run tests like this.

Giving a vm more CPUs will help if the bottleneck is CPU. If the bottleneck is somewhere else, it won't help and could even make things worse.

Re: Multicore performance issue

Posted: 19. Oct 2019, 10:35
by socratis
If you want to have even the slimmest of chances for the developers to develop an interest on this behavior, you shouldn't be talking abstractly about "tests", you got to be really specific, detailed and thorough. Provide the exact tests, and scripts used. Detailed step-by-step instructions...

Re: Multicore performance issue

Posted: 20. Oct 2019, 14:51
by andersm
If it wasn't clear from the original post, it is production code, not benchmarks. The best description I can give is "a swarm of processes performing a lot of IPC over TCP." We did experiment with setting the CPU affinity of all processes belonging to one test run to one vCPU inside the guest, which saw a significant increase in performance, but not nearly enough to make the setup usable.

Anyway, since it seems VirtualBox can't handle this workload, we decided to scrap the virtualization approach and go with native Windows. We'll just have to put in a bit more effort to wall the machines off from the rest of the world.

Re: Multicore performance issue

Posted: 20. Oct 2019, 23:00
by socratis
andersm wrote:If it wasn't clear from the original post, it is production code, not benchmarks.
Sure, but I don't see how that would make a difference in the problem. And I actually talked about "tests", as in "unit tests" from your original post...
andersm wrote:since it seems VirtualBox can't handle this workload, we decided to scrap the virtualization approach and go with native Windows
That's always an alternative. But I wouldn't expect VirtualBox to address the issues any time soon if they don't have a detailed, reproducible scenario.