Page 2 of 4

Re: 4.3.x Performance regression vs 4.2.18

Posted: 8. Nov 2013, 13:07
by michaln
billsc26 wrote:And when the guest VM is booted, it will show ~0% CPU load, but still act slow - so this is NOT because the WU service is stealing cycles.
What about the host CPU utilization? Does it go up? Or is the guest slower without actually consuming extra CPU cycles at all?

Re: 4.3.x Performance regression vs 4.2.18

Posted: 8. Nov 2013, 17:39
by michaln
So I ran SuperPI in a XP guest with no Guest Additions, was consistently getting about 24.8 sec per 2M run. Then I cloned the VM, installed GAs, and started getting consistently worse values, about 26.0 sec per 2M run. Sounds like your problem, right?

Well, no. I went back to the original 'clean' XP VM, and was now getting the same 26.0 sec per 2M run. A bit of digging showed that Windows Defender was running on the host, occupying about 8-10% of CPU time (on an 8-thread host). Once Defender stopped doing whatever it had been doing and the host CPU was 98%+ idle, SuperPI went back to completing a 2M run in 24.8 sec on the VM with Guest Additions installed.

Thanks to Intel, this is the new normal. When the CPU is idle, a fully loaded core will get a lot of "turbo boost". As soon as some other core starts doing something, the turbo boosting won't apply. Suddenly the performance drops 5-10% even though the host is by no means overloaded. On the system where there's a significant difference between the baseline and turbo boosted frequency, this does make a noticeable difference.

Anyway, I can't reproduce your problem, sorry.

Re: 4.3.x Performance regression vs 4.2.18

Posted: 8. Nov 2013, 22:26
by billsc26
michaln wrote:So I ran SuperPI in a XP guest with no Guest Additions, was consistently getting about 24.8 sec per 2M run. Then I cloned the VM, installed GAs, and started getting consistently worse values, about 26.0 sec per 2M run. Sounds like your problem, right?

Well, no. I went back to the original 'clean' XP VM, and was now getting the same 26.0 sec per 2M run. A bit of digging showed that Windows Defender was running on the host, occupying about 8-10% of CPU time (on an 8-thread host). Once Defender stopped doing whatever it had been doing and the host CPU was 98%+ idle, SuperPI went back to completing a 2M run in 24.8 sec on the VM with Guest Additions installed.

Thanks to Intel, this is the new normal. When the CPU is idle, a fully loaded core will get a lot of "turbo boost". As soon as some other core starts doing something, the turbo boosting won't apply. Suddenly the performance drops 5-10% even though the host is by no means overloaded. On the system where there's a significant difference between the baseline and turbo boosted frequency, this does make a noticeable difference.

Anyway, I can't reproduce your problem, sorry.
I was seeing a larger difference, but your points are well taken. I was watching on both the host and guest to make sure nothing snuck in and stole cycles, but I could have missed anything.

Did you try resetting the guest machine to see if that had an impact? I tried it a couple times, and resetting the guest - at least for XP - seems to incur a slowdown. Because of the procedure I was using to switch between 4.2.18 and 4.3.2, I am worried it could have been the resets that caused the slowdown and it was just getting blamed on the GAs.

Re: 4.3.x Performance regression vs 4.2.18

Posted: 8. Nov 2013, 23:24
by michaln
Yes, I did try resetting the guests a few times, but couldn't see it making any difference. Of course your VM might be set up differently and running different software, so it's conceivable that it does behave differently. Windows XP is somewhat worse than the later releases in this regard due to the several different HALs which have a fairly significant impact on how the OS works.

Re: 4.3.x Performance regression vs 4.2.18

Posted: 8. Nov 2013, 23:32
by billsc26
michaln wrote:Yes, I did try resetting the guests a few times, but couldn't see it making any difference. Of course your VM might be set up differently and running different software, so it's conceivable that it does behave differently. Windows XP is somewhat worse than the later releases in this regard due to the several different HALs which have a fairly significant impact on how the OS works.
That is interesting. Now that I know what I'm looking for, it is repeatable on my system. It seems to be a guest-wide slowdown, with the CPU-bound stuff like SuperPI less impacted than other tasks. A reset or guest reboot triggers it. Like you say, it could be specific to my configuration - this one is XP with all four CPU cores assigned to it (I run CPU-intensive stuff in the guest). Is there any information I could provide that would be useful?

Re: 4.3.x Performance regression vs 4.2.18

Posted: 9. Nov 2013, 12:25
by michaln
billsc26 wrote:Is there any information I could provide that would be useful?
A not-too-difficult to follow reproduction scenario would be useful.

Re: 4.3.x Performance regression vs 4.2.18

Posted: 10. Nov 2013, 02:12
by billsc26
michaln wrote:
billsc26 wrote:Is there any information I could provide that would be useful?
A not-too-difficult to follow reproduction scenario would be useful.
On my test machine it's as simple as start XP VM. Get to guest desktop. Reset VM. When XP guest restarts, see slowdown. That slowdown stays through subsequent guest resets/reboots. Closing and opening the VM seems to clear it up until the next guest reset/reboot.

Re: 4.3.x Performance regression vs 4.2.18

Posted: 10. Nov 2013, 17:01
by michaln
billsc26 wrote:When XP guest restarts, see slowdown.
Sorry, that doesn't work. What exactly needs to be measured to determine that the slowdown occurred? If it's "obvious" on your system, chances are it won't be on anyone else's.

Re: 4.3.x Performance regression vs 4.2.18

Posted: 11. Nov 2013, 00:53
by michaln
Please also check viewtopic.php?f=1&t=58445 in case it's relevant.

Re: 4.3.x Performance regression vs 4.2.18

Posted: 11. Nov 2013, 04:49
by billsc26
michaln wrote:
billsc26 wrote:When XP guest restarts, see slowdown.
Sorry, that doesn't work. What exactly needs to be measured to determine that the slowdown occurred? If it's "obvious" on your system, chances are it won't be on anyone else's.
I measured the drop with SuperPI. That's the only measurement I have. The OS feels slower. I KNOW the apps in the guest are slower, but I don't have a good way to measure it. Most of the apps I have in the guest aren't the kind that would be easy to benchmark.

Re: 4.3.x Performance regression vs 4.2.18

Posted: 11. Nov 2013, 04:57
by billsc26
michaln wrote:Please also check viewtopic.php?f=1&t=58445 in case it's relevant.
My XP guests are configured the same way:

- Multiple cores
- VT-x
- IO APIC

Similar to the original poster, I am using the same XP guest across 4.2.18 and 4.3.2. Difference is that I have not tried switching between the IO APIC and the PIC HAL. Like the other poster, I don't know what the problem is - but I know I had not seen this problem in the 4.2.xx series. I tried 4.3.0, but I was badly bitten by the IO APIC so I only have data from 4.3.2. I have not completely excluded some problem with the 4.3.2 GAs, but I now know that rebooting my VMs seems to trigger some performance issue.

If you like I can run some more benchmarks before and after rebooting the guest. If nothing else it would show that the slowdown is real, at least in my guest VMs.

Re: 4.3.x Performance regression vs 4.2.18

Posted: 11. Nov 2013, 12:52
by michaln
billsc26 wrote:My XP guests are configured the same way:

- Multiple cores
- VT-x
- IO APIC
Actually, no... VT-x is what's different. In the other post it's AMD-V, which in this particular case probably makes a big difference.
Similar to the original poster, I am using the same XP guest across 4.2.18 and 4.3.2. Difference is that I have not tried switching between the IO APIC and the PIC HAL. Like the other poster, I don't know what the problem is - but I know I had not seen this problem in the 4.2.xx series. I tried 4.3.0, but I was badly bitten by the IO APIC so I only have data from 4.3.2. I have not completely excluded some problem with the 4.3.2 GAs, but I now know that rebooting my VMs seems to trigger some performance issue.
Actually if you could try a single-VCPU VM and find out if the Windows HAL makes a difference, that would be great.
If you like I can run some more benchmarks before and after rebooting the guest. If nothing else it would show that the slowdown is real, at least in my guest VMs.
While looking at the logs in the other post, it occurred to me that if you boot up a VM (not restore a saved state), the timestamps in the log entries give a pretty good indication of boot performance. The entry showing "Guest Additions capability report: (0x5) seamless: yes, hostWindowMapping: no, graphics: yes" (the capability report where graphics: no changes to graphics: yes) may be a good "benchmark".

Re: 4.3.x Performance regression vs 4.2.18

Posted: 12. Nov 2013, 05:11
by billsc26
michaln wrote:
billsc26 wrote:My XP guests are configured the same way:

- Multiple cores
- VT-x
- IO APIC
Actually, no... VT-x is what's different. In the other post it's AMD-V, which in this particular case probably makes a big difference.
Similar to the original poster, I am using the same XP guest across 4.2.18 and 4.3.2. Difference is that I have not tried switching between the IO APIC and the PIC HAL. Like the other poster, I don't know what the problem is - but I know I had not seen this problem in the 4.2.xx series. I tried 4.3.0, but I was badly bitten by the IO APIC so I only have data from 4.3.2. I have not completely excluded some problem with the 4.3.2 GAs, but I now know that rebooting my VMs seems to trigger some performance issue.
Actually if you could try a single-VCPU VM and find out if the Windows HAL makes a difference, that would be great.
If you like I can run some more benchmarks before and after rebooting the guest. If nothing else it would show that the slowdown is real, at least in my guest VMs.
While looking at the logs in the other post, it occurred to me that if you boot up a VM (not restore a saved state), the timestamps in the log entries give a pretty good indication of boot performance. The entry showing "Guest Additions capability report: (0x5) seamless: yes, hostWindowMapping: no, graphics: yes" (the capability report where graphics: no changes to graphics: yes) may be a good "benchmark".
Ok. I tried the following scenarios - and captured the LOG files from each (except the very last). All this was with 4.3.2 and GAs 4.2.18, XP SP3 and SuperPI:

- ACPI SMP HAL, 2 VCPU, 2M - 30 s
- ACPI SMP HAL, 2 VCPU, 2M - 35 s (after guest reboot)

- ACPI SMP HAL, 1 VCPU, 2M - 31 s
- ACPI SMP HAL, 1 VCPU, 2M - 37 s (after guest reboot). SuperPI doesn't really show it, but this was a total slideshow. Guest was so slow as to be unusable for anything but benchmarking

- ACPI SMP HAL, 1 VCPU, 2M - 36 s (VT-x disabled)
- ACPI SMP HAL, 1 VCPU, 2M - 36 s (after guest reboot, VT-x disabled)

- ACPI UP HAL, 1 VCPU, 2M - 29 s
- ACPI UP HAL, 1 VCPU, 2M - 29 s (after guest reboot)

I don't know if the numbers adequately show it, but the ACPI UP HAL... wow. It was like booting into a new computer. The VM was super responsive and quick. The difference in SuperPI of 29 s on this HAL vs. 30 s on the "normal" HAL with 2 CPU was like the difference in night and day. The benchmarks really don't do it justice.

Re: 4.3.x Performance regression vs 4.2.18

Posted: 12. Nov 2013, 16:29
by michaln
billsc26 wrote:Ok. I tried the following scenarios - and captured the LOG files from each (except the very last). All this was with 4.3.2 and GAs 4.2.18, XP SP3 and SuperPI:
Thanks! Now a terminology question: By "UP HAL" do you mean the "Advanced Configuration and Power Interface (ACPI) PC" or the "ACPI Uniprocessor PC" HAL, as displayed under Computer in the Device Manager? I assume it's actually the former, but would like to confirm.

As a side note, the "ACPI PC" HAL ignores the I/O APIC, if configured for the VM. The "ACPI Uniprocessor PC" HAL requires the I/O APIC and the VM won't work without it.
I don't know if the numbers adequately show it, but the ACPI UP HAL... wow. It was like booting into a new computer. The VM was super responsive and quick. The difference in SuperPI of 29 s on this HAL vs. 30 s on the "normal" HAL with 2 CPU was like the difference in night and day. The benchmarks really don't do it justice.
That's actually interesting. So the VM feels unresponsive but a CPU intensive benchmark executed inside the VM only shows a minor difference. Comparing your runs with 1 VCPU, VT-x and different HALs, it's apparent that with one case the VM booted up in 15 seconds and in the other case 25 seconds (for a semi-arbitrary definition of "booted up").

It is expected that XP will perform best with the default settings (ACPI PC HAL, no I/O APIC), but I don't think we expect the difference to be that big. But now at least we know what to look at, and I know why I initially didn't see any difference at all.

Re: 4.3.x Performance regression vs 4.2.18

Posted: 13. Nov 2013, 06:17
by billsc26
michaln wrote:
billsc26 wrote:Ok. I tried the following scenarios - and captured the LOG files from each (except the very last). All this was with 4.3.2 and GAs 4.2.18, XP SP3 and SuperPI:
Thanks! Now a terminology question: By "UP HAL" do you mean the "Advanced Configuration and Power Interface (ACPI) PC" or the "ACPI Uniprocessor PC" HAL, as displayed under Computer in the Device Manager? I assume it's actually the former, but would like to confirm.

As a side note, the "ACPI PC" HAL ignores the I/O APIC, if configured for the VM. The "ACPI Uniprocessor PC" HAL requires the I/O APIC and the VM won't work without it.
I don't know if the numbers adequately show it, but the ACPI UP HAL... wow. It was like booting into a new computer. The VM was super responsive and quick. The difference in SuperPI of 29 s on this HAL vs. 30 s on the "normal" HAL with 2 CPU was like the difference in night and day. The benchmarks really don't do it justice.
That's actually interesting. So the VM feels unresponsive but a CPU intensive benchmark executed inside the VM only shows a minor difference. Comparing your runs with 1 VCPU, VT-x and different HALs, it's apparent that with one case the VM booted up in 15 seconds and in the other case 25 seconds (for a semi-arbitrary definition of "booted up").

It is expected that XP will perform best with the default settings (ACPI PC HAL, no I/O APIC), but I don't think we expect the difference to be that big. But now at least we know what to look at, and I know why I initially didn't see any difference at all.
Yes, by "UP HAL" I meant "Advanced Configuration and Power Interface (ACPI) PC".

I know it's hard to quantify, because a pure-CPU task, like SuperPI, only shows a moderate slowdown. As you noticed on the boot time, there is a huge difference in perceived speed in the guest, however. I wasn't kidding when I said the worst case I tested was a slideshow. It would be completely unusable for anything but a benchmark, even though SuperPI didn't look like a disaster.

I hope you found this information useful and can track down the issue. I'm guessing I'm not the only person running CPU intensive apps on an XP guest. Thanks for your help. Don't hesitate to ask if there is anything else I can do to help.