Can plenty of CPU cores and RAM be detrimental to the performance?

This is for discussing general topics about how to use VirtualBox.
hewking
Posts: 12
Joined: 2. Oct 2023, 08:19

Can plenty of CPU cores and RAM be detrimental to the performance?

Post by hewking »

hi All
I heard that giving too many cores and RAM to the guest can be detrimental to the performance - it was label as schoolboy error.
Unfortunately this comment was not explained properly so i wonder if you could share your opinion and undertanding?
Regards
scottgus1
Site Moderator
Posts: 20945
Joined: 30. Dec 2009, 20:14
Primary OS: MS Windows 10
VBox Version: VirtualBox+Oracle ExtPack
Guest OSses: Windows, Linux

Re: Can plenty of CPU cores and RAM be detrimental to the performance?

Post by scottgus1 »

Each extra processor in the VM is really a thread or set of threads on the host OS. The host has to schedule those threads. This extra scheduling time slows down the other threads that can run. If the processor in the VM isn't being used strongly, the extra scheduling oversight takes away from the other processors in the VM that are running, and the VM runs slower.

A modern OS runs best for its own tasks in a VM with two processors. When other 3rd-party apps are installed in the VM that can use multiple processors, like video transcoding or other parallel processing apps, then more processors may be beneficial.

The usual explanation of not using more than half of the host processor count in the VM is a general way of helping folks not over-burden their host. Using just 2 processors in the VM regardless of host processor count gets most new Virtualbox users off the ground.
arQon
Posts: 231
Joined: 1. Jan 2017, 09:16
Primary OS: MS Windows 7
VBox Version: VirtualBox+Oracle ExtPack
Guest OSses: Ubuntu 16.04 x64, W7

Re: Can plenty of CPU cores and RAM be detrimental to the performance?

Post by arQon »

Sorry Scott, but unless I'm misreading you that's pretty much backwards.

The cost of bookkeeping a handful of extra threads is basically zero: modern systems are already running hundreds of them for background tasks by the time a user can even log in, and even several thousand more wouldn't matter at all. The only real impact is when the CPU in the guest *is* getting hammered, not when it isn't. Since a core can't execute both host and guest code without a very expensive context switch, if the guest is maxing out all the cores it's possible to starve the host. That's why tasks "in the VM" that actually require the host to do a lot of the work (usually anything IO-heavy, and especially paging) will experience measurable slowdowns at times; but tasks that are CPU-bound will happily run at 95%+ rates with every core (i.e. "core" count / 2 on desktop CPUs, or full core count on Atoms etc) assigned to the guest.

Unsurprisingly, this tends to lead to a small number of people in both camps getting very shouty about how the other group are wrong - usually without any proof despite the impact being trivially measurable :P, but even with that they still wouldn't agree, because it comes to down to the *type* of workload as much as it does resource allocation.

The more conservative position is simply exactly that: "least likely to cause problems" (and specifically, ones that are counter-intuitive to newbies and tedious to explain, especially after the 50th time). Since, as Scott says, most software spends 99% of its time idle anyway, even just two cores is massive overkill, but it's as low as some guests can go without experiencing a different set of problems, so it's the best "default" advice.

RAM is a different story, and a little harder to explain simply, but... for the most part, memory that a VM has *ever* used is "gone" from the host, even if the VM isn't using it any more. That can have an enormous impact on the entire machine's performance **, and no amount of CPU will help with it. Again, it's highly situational: you need to add up all the RAM used by the host and all concurrent guests, and compare that to the amount of memory in the host as a simple pass/fail question. Realistically, what that means is "Give each guest as little as possible without making it uncomfortable". (Disabling swap in the guest is the easiest way to find where its line is).
Again, your specific workload is what matters, and any soundbite from someone saying "X MB" is worthless unless you're doing exactly the same things in your VM that they are in theirs.

** As a rough guide: if you were around for the transition from HDD to SSD, you probably thought SSDs were amazing. SSDs were less than 3x faster than HDDs though (less than 2x faster if you use the terms correctly, but nobody except mathematicians and pedants ever does) and RAM is/was ~20x faster than SSD. You may not be able to tell the difference, but the machine certainly can. :)
Last edited by arQon on 28. Oct 2023, 11:09, edited 1 time in total.
mpack
Site Moderator
Posts: 39134
Joined: 4. Sep 2008, 17:09
Primary OS: MS Windows 10
VBox Version: VirtualBox+Oracle ExtPack
Guest OSses: Mostly XP

Re: Can plenty of CPU cores and RAM be detrimental to the performance?

Post by mpack »

See this discussion by a dev on the issue of wasted core allocations: FAQ: Cores vs Threads. See the quoted text at the bottom. Basically, if the cores aren't being kept busy then they're a significant overhead.
fth0
Volunteer
Posts: 5690
Joined: 14. Feb 2019, 03:06
Primary OS: Mac OS X other
VBox Version: VirtualBox+Oracle ExtPack
Guest OSses: Linux, Windows 10, ...
Location: Germany

Re: Can plenty of CPU cores and RAM be detrimental to the performance?

Post by fth0 »

In the quoted text, Ramshankar described the overhead of vCPUs (especially the world-switches) and how that can cause problems when providing as many vCPUs to the VM as there are CPU cores on the host. If you provide less vCPUs to the VM as there are CPU cores on the host, but more vCPUs than the guest really uses, you're wasting CPU resources. But what do you mean with "significant"?
mpack
Site Moderator
Posts: 39134
Joined: 4. Sep 2008, 17:09
Primary OS: MS Windows 10
VBox Version: VirtualBox+Oracle ExtPack
Guest OSses: Mostly XP

Re: Can plenty of CPU cores and RAM be detrimental to the performance?

Post by mpack »

Ramshankar wrote: Getting in and out of executing guest code via VT-x is still quite an expensive operation.
If you disagree with Ramshankar then please debate it with him, not me.
fth0
Volunteer
Posts: 5690
Joined: 14. Feb 2019, 03:06
Primary OS: Mac OS X other
VBox Version: VirtualBox+Oracle ExtPack
Guest OSses: Linux, Windows 10, ...
Location: Germany

Re: Can plenty of CPU cores and RAM be detrimental to the performance?

Post by fth0 »

mpack wrote: 29. Oct 2023, 11:08 If you disagree with Ramshankar [...]
I do not disagree with Ramshankar at all.

Thanks for the clarification! It wasn't clear to me if you perhaps interpreted more into it, because you used "significant" in the context of "cores aren't being kept busy". Let me explain this with an example:

Consider a CPU with 8 cores (without hyperthreading that I'd ignore here for simplicity anyway). Let the host fully use 2 cores, provide 2 vCPUs to a VM, let the guest fully use the 2 vCPUs, and you'll see a total CPU load of slightly over 50 %. If you now provide a third vCPU to the VM without explicitly using it, you'll additionally have that "quite an expensive operation" on a third CPU core. Even if this would result in a CPU load of 8 % on that core, the total CPU load would only increase by 1 %. That's why I wouldn't call this "significant".

If you get relatively close to the full use of all 8 cores, the "quite an expensive operation" will get very "significant", of course.
arQon
Posts: 231
Joined: 1. Jan 2017, 09:16
Primary OS: MS Windows 7
VBox Version: VirtualBox+Oracle ExtPack
Guest OSses: Ubuntu 16.04 x64, W7

Re: Can plenty of CPU cores and RAM be detrimental to the performance?

Post by arQon »

fth0 wrote: 29. Oct 2023, 14:20 That's why I wouldn't call this "significant".
I think there are two things causing most of the confusion here. One comes from reuse of terminology without qualifiers, which I was guilty of too: the "Since a core can't execute both host and guest code without a very expensive context switch" doesn't mean "context switch" in the normal use of the term, i.e. saving registers and jumping to a new process: it means VMLAUNCH / VMRESUME, which do a lot more work and have other issues (transactional aborts, etc).

There's also a log of aging going on though. Many things that were true a decade ago no longer are. VTx used to only be available on Xeons, for example, so a lot of hypervisor use was running without HW assist. Operations that used to cost e.g. 800 cycles now cost 80; and *anything* that was only true for unassisted (i.e. non-VTx) hypervisors, where the VM roundtrips were 100x more expensive, is - as far as this forum is concerned - no longer relevant at all.

I'll try to expand on this later in the week, but the Intel SDM has all the information on this you could ever want, if you're bored. :)
fth0
Volunteer
Posts: 5690
Joined: 14. Feb 2019, 03:06
Primary OS: Mac OS X other
VBox Version: VirtualBox+Oracle ExtPack
Guest OSses: Linux, Windows 10, ...
Location: Germany

Re: Can plenty of CPU cores and RAM be detrimental to the performance?

Post by fth0 »

arQon wrote: 30. Oct 2023, 21:44 [...] the Intel SDM has all the information on this you could ever want, if you're bored. :)
I know, I've read several chapters over the years and use it from time to time. ;)
arQon wrote: 30. Oct 2023, 21:44 There's also a log of aging going on though. Many things that were true a decade ago no longer are.
I noticed that in forum discussions, too. I started using VirtualBox ~5 years ago, so I don't really know its earlier history.

To get us back on topic, one example is the so-called "world switch" (*):

I know from both the Intel SDM and from the VirtualBox source code, that VM-exits and VM-entries are done on each logical processor independently, yet some people think that the world switches happen simultaneously on both hyperthreads of a core or simultaneously on all cores. I've been wondering if it ever was different in VirtualBox's past ...

(*) The term "world switch" can be misleading, because it means to switch between the worlds of the VMX root and non-root modes, not to switch all CPU cores at the same time.
arQon
Posts: 231
Joined: 1. Jan 2017, 09:16
Primary OS: MS Windows 7
VBox Version: VirtualBox+Oracle ExtPack
Guest OSses: Ubuntu 16.04 x64, W7

Re: Can plenty of CPU cores and RAM be detrimental to the performance?

Post by arQon »

fth0 wrote: 31. Oct 2023, 01:30 I noticed that in forum discussions, too. I started using VirtualBox ~5 years ago, so I don't really know its earlier history.
It's more about the changes in hardware than software. Virtualization was incredibly niche 15 years ago, and we were weird even among the geeks for caring about it back then. Nobody else cared much about it either, including Intel, and early versions of hardware support weren't really much more than a couple of helper instructions - and they were about as slow as the software implementations they were supposed to be helping with.
Jump ahead to "only" 10 years ago though and AWS was taking off, generating far higher percentage returns than the retail arm, and they couldn't add capacity fast enough to keep up with demand. Virtualization became critical to selling server and HEDT CPUs, and that made Intel pay attention. (And remember, at the time a 4C/8T Xeon was the best x86 *server* chip you could buy). Performance improved, and new virtualization features like nested paging and paravirtualization went from nonexistent to mainstream in just a few years.

For the most part, statements that were true ~5 years ago generally still are, or at least mostly so. By the time you're looking at 10 years ago, it's more likely than not that those statements are now false; and at 15 years ago essentially all of them are.

> To get us back on topic, one example is the so-called "world switch" (*):

> I know from both the Intel SDM and from the VirtualBox source code, that VM-exits and VM-entries are done on each logical processor independently, yet some people think that the world switches happen simultaneously on both hyperthreads of a core or simultaneously on all cores. I've been wondering if it ever was different in VirtualBox's past ...

Not as far as I know, though I've "only" been using VB since late 2009 or so. That particular detail though is probably the most unintuitive one there is, judging by the number of newbies who give their first VM 14 of their "16" cores despite the GUI complaining, and then come here and argue with everyone about how it should have been fine Because Magic. :)

Every once in a while though, those arguments are worth having, because the "tribal knowledge" passed down over the years is stale. My personal bugbear is this very topic - specifically, where the cutoff of "plenty of cores" is. I'm very tired of people parroting the "< nC" line based on fifth-hand repetition of something that was already situational even at the time the comment was made. It's fine to do so as a best practice for newbies, since nobody knows what sort of workload they're going to be running; but when those same newbies are willing die on that hill 6 months later in the face of actual benchmarks, experience, or understanding, that's not knowledge, it's indoctrination.

The "spare core" myth applies to IO and basically nothing else, and always has. The "stuff that might time out under certain circumstances" is IO - *fifteen years ago*, on rotational media that took 10-20 seconds to spin up from a parked state and 20ms to reposition the heads for each entry in the queue; not SSDs with 0.2ms latencies and literally hundreds of gigabytes of DRAM cache.

That said, it's probably still about the least-bad simple guideline we can give here, after "If all you're trying to do is run a generic desktop / toy server / etc, go with 2 cores", though it's still not great. We don't have infinite spare time though, and newbies generally still don't read the posting guidelines or provide even the most basic information, let alone enough to actually be guided to good resource allocation, so compromises have to be made.
It's certainly not beyond us to provide something better for the ones who are actually trying to understand what they're doing rather than just trying to get an assignment out of the way, *and* have guest workloads that will genuinely benefit from the extra cores, but I have no idea how large that group is.
scottgus1
Site Moderator
Posts: 20945
Joined: 30. Dec 2009, 20:14
Primary OS: MS Windows 10
VBox Version: VirtualBox+Oracle ExtPack
Guest OSses: Windows, Linux

Re: Can plenty of CPU cores and RAM be detrimental to the performance?

Post by scottgus1 »

If it is time to change the official promulgation of doctrine, then let's keep in mind this principle:

The most-well-founded kind of belief about something one cannot see (like what is actually going on inside a CPU, or radio waves, or the speed of light, etc) is based on evidence of that invisible thing acting in ways we can see.

So, evidence time. I don't have one of those 24-core monsters to set up a VM with few or plenty of processors to assign and then see what happens. But those that do (and are aware of the newer performance/economy-cores processors and their possible interference if they have such a processor) can post some time & throughput records of low- vs high-processor count VMs doing something real-world (not benchmark) like video transcoding, compiling, etc, along with boot times and performance indicators like Youtube viewing.

Such information can tell us what actually happens, and if advances in technology have made it necessary to change the forum go-to instructions.
arQon
Posts: 231
Joined: 1. Jan 2017, 09:16
Primary OS: MS Windows 7
VBox Version: VirtualBox+Oracle ExtPack
Guest OSses: Ubuntu 16.04 x64, W7

Re: Can plenty of CPU cores and RAM be detrimental to the performance?

Post by arQon »

scottgus1 wrote: 6. Nov 2023, 14:13 So, evidence time.
Yeah, it's the only way to go. Ideally, easily-reproducible evidence; but pretty much anything would be a step forwards.

> time & throughput records of low- vs high-processor count VMs doing something real-world (not benchmark) like video transcoding

Worth pointing out that this needs to be software transcoding - which is currently the only option in a guest, but not so on the host, and even the guest limitation may change in the future.

> compiling, etc, along with boot times and performance indicators like Youtube viewing.

Remarkably close to what I was going to suggest (which is encouraging), though I'm not sure about boot times: my VMs generally boot in under 5s if warm (this one started the desktop after 2.665s) and I think that sort of scale is probably a little *too* sensitive to interference.

For video, although YouTube has the advantage of providing a common source its diagnostics are fairly terrible. I'm planning on looking into whether mpv can be persuaded to provide its Dropped Frames count in a terminal at the end of a video, which would be much less demanding of tester time.

> I don't have one of those 24-core monsters to set up a VM with few or plenty of processors to assign and then see what happens.

I don't either, but I would expect older HW to emphasize any problems, which is what we want for the "spare core" claim.

Anecdotally, I know I can run make -j4 in a 4T VM on a 4C/8T CPU - i.e. the exact case that is supposedly problematic - on a codebase that takes up to 40 minutes to build, while watching streamed 1080p video in another (2T) VM with zero dropped frames. Since the part that really "matters" is the total system load though, the specifics of the workloads implicitly do as well. I'd expect a more math- / memory- intensive guest task (like compression or transcoding) to be more likely to have a negative host impact, so I think that's a good choice for a test, but it's important that the load is actually problem-free if all run on the host instead: otherwise you're just measuring the HW's in/ability to cope with it, not the impact of some of those tasks being run in a VM.

For the case that you're more interested in, taskset (or start /affinity on Windows) should be used on the host to establish the "ideal" performance of N threads on a given machine. Once that's done, comparisons with e.g. a 2 CPU VM, and a 30 CPU VM that's also using tasket to restrict the transcoding/whatever to 2 CPUs within that VM, should be easy.

Hopefully I'll have some time to do this myself in the next few days: it won't answer your question since I don't have the HW for that, but it'll provide an example to get us started; and for people to agree is appropriate or point out flaws in.

add> This never really belonged in Windows Guests: could you move it to Using instead please?
scottgus1
Site Moderator
Posts: 20945
Joined: 30. Dec 2009, 20:14
Primary OS: MS Windows 10
VBox Version: VirtualBox+Oracle ExtPack
Guest OSses: Windows, Linux

Re: Can plenty of CPU cores and RAM be detrimental to the performance?

Post by scottgus1 »

arQon wrote: 12. Nov 2023, 13:29 This never really belonged in Windows Guests: could you move it to Using instead please?
Good idea, 'tis done. :D
fth0
Volunteer
Posts: 5690
Joined: 14. Feb 2019, 03:06
Primary OS: Mac OS X other
VBox Version: VirtualBox+Oracle ExtPack
Guest OSses: Linux, Windows 10, ...
Location: Germany

Re: Can plenty of CPU cores and RAM be detrimental to the performance?

Post by fth0 »

A few general remarks (that are not meant to keep you from measuring ;)):

Benchmarking: There are a lot of variables in benchmarking, making it difficult to get generally valid results. In consequence, the effort can easily get arbitrarily large. For an interesting read, see the pages of Brendan Gregg, especially the page about Active Benchmarking.

Host OS: Linux and Windows are behaving quite different in the area of scheduling, so results from one host OS are not necessarily valid on the other host OS. For example, consider a VM with 2 vCPUs running at 100% CPU load on a 4C/8T CPU. On a Linux host, you'll see 2T at 100% "jumping around" between the 8T, avoiding to hit the 2T of the same 1C at any time. On a Windows host, you'll see 4T at 50%, perhaps jumping between the 2T of the same 1C. Note that different Windows variants can schedule differently by default, too.

Hyperthreading: Many current CPUs use Hyperthreading, so it cannot be ignored (for simplification) when benchmarking. In consequence, it gets important which CPU components are shared between the 2T of 1C and between the nC of 1S (S = socket; ignoring multi-socket CPUs for simplicity). CPU components are for example CPU registers, CPU caches (L1, L2, L3) and APICs.

YouTube can adapt their video data streams dependent on the performance of the player (e.g. data rate, video resolution, video codec, video container). It might be easy to shoot oneself in the foot, if not looking after such details. ;)
FranceBB
Posts: 121
Joined: 20. May 2017, 05:07
Primary OS: Fedora other
VBox Version: OSE Fedora
Guest OSses: Windows XP x86
Contact:

Re: Can plenty of CPU cores and RAM be detrimental to the performance?

Post by FranceBB »

fth0 wrote: 12. Nov 2023, 16:17 ignoring multi-socket CPUs for simplicity
I'm actually very curious, though.
I actually have an old 8c/16th dual socket Intel Xeon configuration (4c/8th x2), however when I assign the cores Virtualbox just shows them as if they were all part of one big CPU.
Over the years I always lived by the assumption that the guest will see it as one simple CPU with many cores, while the host OS will take care of it, thus making non NUMA Nodes aware programs in the guest effectively run as if they were.
Was my assumption totally wrong? What should a user do in these cases?
I've always been very curious 'cause it's a situation I face a lot and it always puzzled me.
Post Reply