Extreme System Interrupts (from disk?) in Win10 guest
Extreme System Interrupts (from disk?) in Win10 guest
I'm experiencing a strange but reproducible problem with a Windows 10 guest on a powerful desktop computer with Ubuntu 18.04 x86_64 host in VirtualBox 6.0.8 (current latest). The guest system becomes suddenly sluggish due to an extremely high "System Interrupts" load, after it worked without any issues for up to half a day. The task manager on Windows shows a very high system load from the "System interrupts" process, and also shows a very high disk load / slow disk response time. However the host system does not suffer from any of this: iotop shows almost zero disk load on the host (in the range of 0-2%), and the host remains resonably responsive and usable.
The problem appears quite reproducible in our setup, in the following scenario: The guest system is used as a build server. Builds execute in a Visual Studio command prompt with ~8-10 parallel build jobs. The build is started from Cygwin. Cygwin is also used to download sources, unpack archives, and clean up temporary files. We re building several libraries amongst which is also Qt 5.13.0. Qt is quite resource hungry and unpacks thousands of files and takes 4-8 hours to build (despite the powerful hardware). For the first 4-8 hours, the guest performs all builds smooth without any problems. Then about 6-10 hours into the build, typically while compiling Qt, the problem appears and makes the guest almost unresponsive. From then on, the machine remains accessible and active, but achieves almost no build throughput and would remain in this sluggish state for long if not shut down.
We have used the same hardware with the same VirtualBox version successfully to build the same software stack in the past, with almost all parameters identical except for using Windows 7 instead of Windows 10. This has worked without problems more than a year, and still works today without issues. However the same software stack (Cygwin version, Build System version, Visual Studio version, etcetc...) always eventually shows this problem as soon as Windows 10 is used.
I have tried to change all parameters I could think of with no success: I've enabled and disabled UEFI vs Bios boot, changed between the available Motherboard Chipsets, changed from USB 2 to USB 3 and back, assigned more or less CPUs, more or less RAM, deleted all snapshots, enabled or disabled the Disk Host Cache, all with no success. I could also not find anything suspicious in the Windows event log (apart from the fact that at some point Windows performance deamon starts to log a slow disk response at some point). I could also not find anything suspicious in the VirtualBox logs (however I'm clearly no expert to judge).
Any help would be greatly appreciated! I can also provide moderated access via TeamViewer or SSH to the machine once it has reached the "problem state".
The hardware is an AMD Ryzen 7 1700X 8-Core Processor (16 threads) with typically 8 cores assigned to Virtualbox. The host features a PCIe-SATA with 3GB/sec speed and 64GB of RAM.
The VM typically is assigned 6-8 cores and 32-48GB of RAM. The two vdi disk files (separate system C:\ and data D:\ disk) are both located on the PCIe-SATA and have the "Show disk as SSD" feature enabled. The VirtualBox guest additions are installed and match the version of VirtualBox. 2D and 3D acceleration are disabled. I can provide the VirtualBox log of the problem case if that helps?
I do not suspect faulty hardware because we can switch back and forth between our Windows 10 and Windows 7 build server VMs, and while the former always eventually fails, the latter always works.
The problem appears quite reproducible in our setup, in the following scenario: The guest system is used as a build server. Builds execute in a Visual Studio command prompt with ~8-10 parallel build jobs. The build is started from Cygwin. Cygwin is also used to download sources, unpack archives, and clean up temporary files. We re building several libraries amongst which is also Qt 5.13.0. Qt is quite resource hungry and unpacks thousands of files and takes 4-8 hours to build (despite the powerful hardware). For the first 4-8 hours, the guest performs all builds smooth without any problems. Then about 6-10 hours into the build, typically while compiling Qt, the problem appears and makes the guest almost unresponsive. From then on, the machine remains accessible and active, but achieves almost no build throughput and would remain in this sluggish state for long if not shut down.
We have used the same hardware with the same VirtualBox version successfully to build the same software stack in the past, with almost all parameters identical except for using Windows 7 instead of Windows 10. This has worked without problems more than a year, and still works today without issues. However the same software stack (Cygwin version, Build System version, Visual Studio version, etcetc...) always eventually shows this problem as soon as Windows 10 is used.
I have tried to change all parameters I could think of with no success: I've enabled and disabled UEFI vs Bios boot, changed between the available Motherboard Chipsets, changed from USB 2 to USB 3 and back, assigned more or less CPUs, more or less RAM, deleted all snapshots, enabled or disabled the Disk Host Cache, all with no success. I could also not find anything suspicious in the Windows event log (apart from the fact that at some point Windows performance deamon starts to log a slow disk response at some point). I could also not find anything suspicious in the VirtualBox logs (however I'm clearly no expert to judge).
Any help would be greatly appreciated! I can also provide moderated access via TeamViewer or SSH to the machine once it has reached the "problem state".
The hardware is an AMD Ryzen 7 1700X 8-Core Processor (16 threads) with typically 8 cores assigned to Virtualbox. The host features a PCIe-SATA with 3GB/sec speed and 64GB of RAM.
The VM typically is assigned 6-8 cores and 32-48GB of RAM. The two vdi disk files (separate system C:\ and data D:\ disk) are both located on the PCIe-SATA and have the "Show disk as SSD" feature enabled. The VirtualBox guest additions are installed and match the version of VirtualBox. 2D and 3D acceleration are disabled. I can provide the VirtualBox log of the problem case if that helps?
I do not suspect faulty hardware because we can switch back and forth between our Windows 10 and Windows 7 build server VMs, and while the former always eventually fails, the latter always works.
-
socratis
- Site Moderator
- Posts: 27329
- Joined: 22. Oct 2010, 11:03
- Primary OS: Mac OS X other
- VBox Version: VirtualBox+Oracle ExtPack
- Guest OSses: Win(*>98), Linux*, OSX>10.5
- Location: Greece
Re: Extreme System Interrupts (from disk?) in Win10 guest
How about if we take a look at a VBox.log as well?emmenlau wrote:I could also not find anything suspicious in the VirtualBox logs (however I'm clearly no expert to judge).
We need to see a complete VBox.log, from a complete VM run, where the problem occurs:
- Start the VM from cold-boot (not from a paused or saved state) / Observe problem / Shutdown the VM (force close it if you have to).
- With the VM completely shut down (not paused or saved), right-click on the VM in the VirtualBox Manager and select "Show Log".
- Save only the first "VBox.log", ZIP it and attach it to your response. See the "Upload attachment" tab below the reply form.
Do NOT send me Personal Messages (PMs) for troubleshooting, they are simply deleted.
Do NOT reply with the "QUOTE" button, please use the "POST REPLY", at the bottom of the form.
If you obfuscate any information requested, I will obfuscate my response. These are virtual UUIDs, not real ones.
Do NOT reply with the "QUOTE" button, please use the "POST REPLY", at the bottom of the form.
If you obfuscate any information requested, I will obfuscate my response. These are virtual UUIDs, not real ones.
Re: Extreme System Interrupts (from disk?) in Win10 guest
I have captured the log and saved it using the "export" feature from VirtualBox the last time the problem appeared. Is that ok? And is gzip compression ok? The log goes from cold boot to shutdown, and includes the problem.
I do not recall the exact time the problem appeared in this execution, but looking at the log it may be when the messages about unresponsive guest start:
I do not recall the exact time the problem appeared in this execution, but looking at the log it may be when the messages about unresponsive guest start:
This execution was using EFI boot, but the same problem appears with Bios boot.VMMDev: vmmDevHeartbeatFlatlinedTimer: Guest seems to be unresponsive. Last heartbeat received 4 seconds ago
- Attachments
-
- CI-Win10-x64-VS2017.9-2019-07-05-23-39-31.log.gz
- (55.12 KiB) Downloaded 204 times
Re: Extreme System Interrupts (from disk?) in Win10 guest
Is there anything else I can do to help the debugging? I can provide VM access via TeamViewer, or I can share the machine so you can reproduce the issue locally (though the current disk size is in the range of 200GB).
Re: Extreme System Interrupts (from disk?) in Win10 guest
Dear @socratis,
can you advise me how to further proceed with this issue? Could you learn something from the log file? Should I create an issue report, or did you already pass the information to support?
Thanks for your help and all the best!
can you advise me how to further proceed with this issue? Could you learn something from the log file? Should I create an issue report, or did you already pass the information to support?
Thanks for your help and all the best!
-
scottgus1
- Site Moderator
- Posts: 20945
- Joined: 30. Dec 2009, 20:14
- Primary OS: MS Windows 10
- VBox Version: VirtualBox+Oracle ExtPack
- Guest OSses: Windows, Linux
Re: Extreme System Interrupts (from disk?) in Win10 guest
Virtualbox doesn't use the threads, only the cores, so you have all the cores given to the guest and none for the host. This is not good. However, you report the same trouble when using less processors....emmenlau wrote:AMD Ryzen 7 1700X 8-Core Processor (16 threads) with typically 8 cores assigned to Virtualbox
You also report no difference booting from BIOS or EFI. FWIW Windows 7 couoldn't boot from Virtualbox's EFI (or at least wasn't supposed to). So although this may not be the issue, it could help during troubleshooting to eliminate another vector of difference to try using just the BIOS on the Windows 10 guest too. (Fresh install with the BIOS might be sure to have this vector removed?)
You've tried toggling the "Use Host I/O Cache" box, that was what I was thinking about.
Sounds like the only difference between the good and bad guests is the guest OS. There is one thing that arises with this thought:
Are you sure these same software versions are completely compatible with Windows 10, a six-year-and-counting newer OS than W7?same software stack (Cygwin version, Build System version, Visual Studio version, etcetc...)
Also are the two guest drives on different physical host disks? Are they SSDs?
-
mpack
- Site Moderator
- Posts: 39134
- Joined: 4. Sep 2008, 17:09
- Primary OS: MS Windows 10
- VBox Version: VirtualBox+Oracle ExtPack
- Guest OSses: Mostly XP
Re: Extreme System Interrupts (from disk?) in Win10 guest
I can testify from experience that all versions of Visual Studio including VC6, VS2008, VS2010 and VS2015 all work just fine in Windows 10 (64bit). They might conflict with each other (just as two versions of VirtualBox would), but not with anything else. The Win32 API hasn't had any functions deleted.scottgus1 wrote: Are you sure these same software versions are completely compatible with Windows 10, a six-year-and-counting newer OS than W7?
The OPs issue sounds like a driver problem to me, presumably a driver not native to Windows.
Re: Extreme System Interrupts (from disk?) in Win10 guest
Dear @mpack and @scottgus1, thanks for your replies.
I can not say whether the software stack is compatible with Windows 10. Since it includes Cygwin most likely nobody is giving any guarantees. But I'm not aware of any issues with either Visual Studio 2017, Visual Studio 2019 or latest Cygwin that show related symptoms. And since I'm using the latest versions of these tools, I would have rather suspected problems with Windows 7 (a six-year-and-counting older OS
) than with Windows 10. However the problem appears only on the latter, never on the former.
Its certainly possible that the issue is related to drivers. However with respect to guest drivers, I did not manually install any drivers in Windows 10 (except for the latest VirtualBox pack). Other than that, its a vanilla Windows 10 installation with just whatever Microsoft Windows 10, Visual Studio and Cygwin install by default in a VirtualBox setup.
With respect to host drivers, the same setup works well with Windows 7, so I assume this is not what you are referring to?
I can not say whether the software stack is compatible with Windows 10. Since it includes Cygwin most likely nobody is giving any guarantees. But I'm not aware of any issues with either Visual Studio 2017, Visual Studio 2019 or latest Cygwin that show related symptoms. And since I'm using the latest versions of these tools, I would have rather suspected problems with Windows 7 (a six-year-and-counting older OS
Its certainly possible that the issue is related to drivers. However with respect to guest drivers, I did not manually install any drivers in Windows 10 (except for the latest VirtualBox pack). Other than that, its a vanilla Windows 10 installation with just whatever Microsoft Windows 10, Visual Studio and Cygwin install by default in a VirtualBox setup.
With respect to host drivers, the same setup works well with Windows 7, so I assume this is not what you are referring to?
-
fth0
- Volunteer
- Posts: 5690
- Joined: 14. Feb 2019, 03:06
- Primary OS: Mac OS X other
- VBox Version: VirtualBox+Oracle ExtPack
- Guest OSses: Linux, Windows 10, ...
- Location: Germany
Re: Extreme System Interrupts (from disk?) in Win10 guest
You've already tried several ideas on the host side without success, therefore I suggest to concentrate a little bit on the guest side. Imagine the Windows guest being just a normal physical PC and treat it as such while searching for the problem cause. In the case something goes wrong, it's only virtual and you can start again.
- Before analyzing the problem, let us try to eliminate the time you have to wait for the problem to occur:
- Take a first snapshot at the beginning, to return to at the very end of all tests, while the VM is not running.
- Let the VM run, until it reaches the problem state, and take a second snapshot, while the VM is running.
- When the System Interrupt process causes high CPU load (> 20%), some virtual or physical hardware either generates way too many interrupts, or some interrupt driver is stuck in a loop. To find the cause, monitor the System Interrupt process in the task manager during the following steps:
- If you stop the build process now, does the CPU load remain high? If not, what happens when continuing the build process? If the CPU load doesn't go up again, restore the VM to the second snapshot. If the CPU load is high again, continue with the next step, otherwise restore the VM to the first snapshot and start from the beginning.
- I know you already suspected the disk device(s). But let's first exclude other possible causes by disabling other devices in the Windows device manager (e.g. audio devices, USB devices, USB hubs, network devices, ...) one by one and checking the CPU load. Leave those devices disabled (they stay enabled inside the snapshots). You can even try do disable the keyboard and the mouse, only one at a time, but before doing that, think about how to reenable the mouse using only the keyboard.
If you didn't think hard enough, you can return to one of the snapshots 
- If the CPU load is high even without the build process running, stop it. Try to disable the second disk device in the Windows device manager and check the CPU load.
- Let us know the results of these experiments. Of course, you also can develop your own ideas from my suggestions above.
Re: Extreme System Interrupts (from disk?) in Win10 guest
Dear @ftho,
thanks a lot for your extensive help! But I'm under the impression that you are looking for a flaw in my specific setup, is that possible? According to my understanding, with the cumulative information I have gathered, the problem must be a general issue with Windows 10 in VirtualBox. Admittedly the circumstances that trigger the problem may be quite specific and rare. But as I outlined, my setup is a vanilla Windows 10 officially from Microsoft with latest updates, my Visual Studio 2017 v15.9 is officially from Microsoft with latest updates, and Cygwin is also the official release from cygwin.org with latest updates. Even if I could pinpoint the problem to a specific Windows component (which would be huge effort for a layman like me), it would only support the fact that this specific Windows 10 component has issues when run inside VirtualBox. I don't think I (personally) would get support from Microsoft for this issue.
Monitoring processes is quite difficult because the load can be so high that I can not easily use the TaskManager or this new detailed Windows 10 Process Inspector (don't know the correct name). Its for example quite hard to click on a task in TaskManager because by the time my mouse click is resolved, the process list already changed. There are some options like sorting the list appropriately, but I've also run into troubles there when the process name was not at the scrolled position I was expecting, etc. So this kind of debugging is quite cumbersome.
So in summary, I can clearly see that my setup may be rare and not many people may be affected by this problem. But I still think that all in all, the evidence speaks for a problem using Windows 10 in VirtualBox. It would be great if support could take over from here.
thanks a lot for your extensive help! But I'm under the impression that you are looking for a flaw in my specific setup, is that possible? According to my understanding, with the cumulative information I have gathered, the problem must be a general issue with Windows 10 in VirtualBox. Admittedly the circumstances that trigger the problem may be quite specific and rare. But as I outlined, my setup is a vanilla Windows 10 officially from Microsoft with latest updates, my Visual Studio 2017 v15.9 is officially from Microsoft with latest updates, and Cygwin is also the official release from cygwin.org with latest updates. Even if I could pinpoint the problem to a specific Windows component (which would be huge effort for a layman like me), it would only support the fact that this specific Windows 10 component has issues when run inside VirtualBox. I don't think I (personally) would get support from Microsoft for this issue.
Monitoring processes is quite difficult because the load can be so high that I can not easily use the TaskManager or this new detailed Windows 10 Process Inspector (don't know the correct name). Its for example quite hard to click on a task in TaskManager because by the time my mouse click is resolved, the process list already changed. There are some options like sorting the list appropriately, but I've also run into troubles there when the process name was not at the scrolled position I was expecting, etc. So this kind of debugging is quite cumbersome.
So in summary, I can clearly see that my setup may be rare and not many people may be affected by this problem. But I still think that all in all, the evidence speaks for a problem using Windows 10 in VirtualBox. It would be great if support could take over from here.
-
mpack
- Site Moderator
- Posts: 39134
- Joined: 4. Sep 2008, 17:09
- Primary OS: MS Windows 10
- VBox Version: VirtualBox+Oracle ExtPack
- Guest OSses: Mostly XP
Re: Extreme System Interrupts (from disk?) in Win10 guest
It should be easy to prove whether VS2017 is the problem, which seems likely to me: it wouldn't surprise me if it installs a lot of low level stuff that could cause problems in a VM, including system level debug features that try to grab VT-x.
Re: Extreme System Interrupts (from disk?) in Win10 guest
I can absolutely see Visual Studio as the cause of the problem. How would I check for this?
-
fth0
- Volunteer
- Posts: 5690
- Joined: 14. Feb 2019, 03:06
- Primary OS: Mac OS X other
- VBox Version: VirtualBox+Oracle ExtPack
- Guest OSses: Linux, Windows 10, ...
- Location: Germany
Re: Extreme System Interrupts (from disk?) in Win10 guest
I'm trying to look at your problem from a general unbiased perspective, using as few assumptions as possible: From your initial descriptionemmenlau wrote:But I'm under the impression that you are looking for a flaw in my specific setup, is that possible?
I understood that the Windows task manager (inside the guest) displays a high CPU load for the System Interrupt process permanently. This could mean that one of the virtual or physical devices, which VirtualBox provides to the Windows guest, either generates way too many interrupts, or some interrupt driver is stuck in a loop. Unfortunately, the Windows task manager does not show which interrupt driver is the culprit, so I would just disable the devices one by one to find out which device driver is responsible. For example, if the host's audio device would cause the guest's audio interrupt driver to go nuts, disabling the guest's audio device should reduce the high CPU load immediately.emmenlau wrote:extremely high "System Interrupts" load
This was my (so I thought) simple strategy. Maybe I complicated it too much by talking about safety (and time saving) measures like snapshots and so on.
Regarding difficulties when monitoring processes: You could start the Windows task manager before the CPU load is high, sort the processes by name, mark the System Interrupt process, then sort the processes by CPU load in descending order. Keep the window at a visible position on the guest's desktop, and you're done with it.
All I've written so far is just a suggestion. You may take it or leave it as you wish.
Re: Extreme System Interrupts (from disk?) in Win10 guest
Dear fth0,
I did not mean to be unthankful! Your instructions are clear, well formulated and make sense to me.
The only problem is that I would prefer that now the official VirtualBox support takes over. I've tried my best to isolate the issue myself, and spent quite some time turning all knobs I could think of, always documenting the results.
Now I'm under the impression that in this specific scenario, VirtualBox does not play well with Windows 10 with any of the recent Visual Studio versions installed. Since my setup almost exclusively contains official Microsoft OS and tools, and these tools are far from exotic, I think this issue should be resolved in VirtualBox. Even if a Windows component should be to blame it seems still relevant that VirtualBox supports such a common Microsoft development setup, or not?
So I'd like to kindly ask if the issue can be forward to the official VirtualBox support?
I did not mean to be unthankful! Your instructions are clear, well formulated and make sense to me.
The only problem is that I would prefer that now the official VirtualBox support takes over. I've tried my best to isolate the issue myself, and spent quite some time turning all knobs I could think of, always documenting the results.
Now I'm under the impression that in this specific scenario, VirtualBox does not play well with Windows 10 with any of the recent Visual Studio versions installed. Since my setup almost exclusively contains official Microsoft OS and tools, and these tools are far from exotic, I think this issue should be resolved in VirtualBox. Even if a Windows component should be to blame it seems still relevant that VirtualBox supports such a common Microsoft development setup, or not?
So I'd like to kindly ask if the issue can be forward to the official VirtualBox support?
-
scottgus1
- Site Moderator
- Posts: 20945
- Joined: 30. Dec 2009, 20:14
- Primary OS: MS Windows 10
- VBox Version: VirtualBox+Oracle ExtPack
- Guest OSses: Windows, Linux
Re: Extreme System Interrupts (from disk?) in Win10 guest
There is no official Virtualbox support, unless you're rich. Licenses for Oracle Virtualbox support start at $6100, see their store. If you pay them then you can get support for your issue.emmenlau wrote:I would prefer that now the official VirtualBox support takes over. ... I'd like to kindly ask if the issue can be forward to the official VirtualBox support?
If you can't afford to pay them, then all you can have is fellow users like fth0 & mpack & socratis & myself, who may add in what we think might help. You can put a ticket on the Bugtracker, but there's no guarantee when of if the developers will address it, unless a paying customer has the same issue
So you might really want to try what fth0 & mpack are suggesting about a default driver or one installed by Visual Studio. Or consider mpack's thought on Visual Studio trying to use VT-x. I have heard that too, maybe if you turn that part off?
Also, what happens to your build times if you try in the two-processor guest?
If it is reproducibly when you're compiling Qt, then what exactly happens when Qt is compiled? Look there for a possible suspect.emmenlau wrote:typically while compiling Qt,