How to crash Solaris host with Windows 7 guest

Discussions related to using VirtualBox on Solaris hosts.
martyscholes
Posts: 202
Joined: 11. Sep 2011, 00:24
Primary OS: Solaris
VBox Version: VirtualBox+Oracle ExtPack
Guest OSses: Win 7, Ubuntu, Win XP, Vista, Win 8, Mint, Pear, Several Linux Virtual Appliances

Re: How to crash Solaris host with Windows 7 guest

Post by martyscholes »

Ramshankar wrote:
martyscholes wrote:Ramshankar,

I owe you an apology. More testing reveals that the errors now show up in /var/adm/messages, but the machine does not crash. The only change I am aware of between then and now is that on 6/11 I added 28GB of RAM to the machine.
Hm, sorry I don't quite follow, so what is the problem currently? Has the problem somehow corrected itself with only those verbose errors in the log?
That is correct. Except for the errors in /var/adm/messages, nothing is currently wrong. Just to make things even weirder, I have two Windows 7 guests, configured almost identically. One guest puts the messages in the host's log, one does not.
Ramshankar
Oracle Corporation
Posts: 793
Joined: 7. Jan 2008, 16:17

Re: How to crash Solaris host with Windows 7 guest

Post by Ramshankar »

martyscholes wrote:
Ramshankar wrote:
martyscholes wrote:Ramshankar,

I owe you an apology. More testing reveals that the errors now show up in /var/adm/messages, but the machine does not crash. The only change I am aware of between then and now is that on 6/11 I added 28GB of RAM to the machine.
Hm, sorry I don't quite follow, so what is the problem currently? Has the problem somehow corrected itself with only those verbose errors in the log?
That is correct. Except for the errors in /var/adm/messages, nothing is currently wrong. Just to make things even weirder, I have two Windows 7 guests, configured almost identically. One guest puts the messages in the host's log, one does not.
Good that you no longer experience host crashes.. and perhaps "almost" is the key. Could you post VBox.log for both the guests? I must admit I didn't yet have time to check the code path for this error.
Oracle Corp.
martyscholes
Posts: 202
Joined: 11. Sep 2011, 00:24
Primary OS: Solaris
VBox Version: VirtualBox+Oracle ExtPack
Guest OSses: Win 7, Ubuntu, Win XP, Vista, Win 8, Mint, Pear, Several Linux Virtual Appliances

Re: How to crash Solaris host with Windows 7 guest

Post by martyscholes »

Ramshankar wrote:
martyscholes wrote: That is correct. Except for the errors in /var/adm/messages, nothing is currently wrong. Just to make things even weirder, I have two Windows 7 guests, configured almost identically. One guest puts the messages in the host's log, one does not.
Good that you no longer experience host crashes.. and perhaps "almost" is the key. Could you post VBox.log for both the guests? I must admit I didn't yet have time to check the code path for this error.
Thanks for the follow up. I have attached the logs. The differences I found with the image that generates the errors (Quicken W7):
* Has a different machine name
* Uses a different VNIC
* Has a different MAC address
* Has 32MB of video RAM vs. 64MB of video RAM on the machine which does not generate the error
* Has different shared folders
* Has 4.1.16 guest additions vs. 4.1.12 guest additions on the machine which does not generate the error
** Quicken W7 did have 4.1.12 guest additions but I upgraded those trying to rule out older GA causing the error
* Has different software installed
** The machine that makes the errors has Quicken, Turbotax and HP Printer software installed
** The machine that does NOT make the error has HP Printer software and various other media-centric software pieces installed

Any insight is appreciated.

Thanks again,
Marty
Attachments
VBox.log
Quicken W7
This does generate errors in /var/adm/messages
(51.55 KiB) Downloaded 54 times
VBox.log
Windows 7 Media
This does not generate errors in /var/adm/messages
(81.65 KiB) Downloaded 55 times
Ramshankar
Oracle Corporation
Posts: 793
Joined: 7. Jan 2008, 16:17

Re: How to crash Solaris host with Windows 7 guest

Post by Ramshankar »

Would it be possible to upload a .OVA file (Export VM) of the VM which causes those pgmR0BthPAEPAETrap0eHandler messages in the system log? If you do anything special in the guest besides just launching it and let it idle, let us know those steps too.

It is pretty difficult to fix this without having some sort of reproduction scenario.
Oracle Corp.
martyscholes
Posts: 202
Joined: 11. Sep 2011, 00:24
Primary OS: Solaris
VBox Version: VirtualBox+Oracle ExtPack
Guest OSses: Win 7, Ubuntu, Win XP, Vista, Win 8, Mint, Pear, Several Linux Virtual Appliances

Re: How to crash Solaris host with Windows 7 guest

Post by martyscholes »

Ramshankar wrote:Would it be possible to upload a .OVA file (Export VM) of the VM which causes those pgmR0BthPAEPAETrap0eHandler messages in the system log? If you do anything special in the guest besides just launching it and let it idle, let us know those steps too.

It is pretty difficult to fix this without having some sort of reproduction scenario.
Ramshankar, thanks for getting back to me. I apologize it took me so long to respond. While I understand the request, I am hesitant to do upload the whole image -- lots of personal data is buried on that disk. Since it no longer destabilizes this server, I am less concerned. Am I the only person who has reported this?

Thanks again,
Marty
Ramshankar
Oracle Corporation
Posts: 793
Joined: 7. Jan 2008, 16:17

Re: How to crash Solaris host with Windows 7 guest

Post by Ramshankar »

martyscholes wrote:Ramshankar, thanks for getting back to me. I apologize it took me so long to respond. While I understand the request, I am hesitant to do upload the whole image -- lots of personal data is buried on that disk. Since it no longer destabilizes this server, I am less concerned. Am I the only person who has reported this?
It's understandable that you are reluctant to upload the VDI, and you're the only person reporting this issue so far. We can drop this issue for now as nobody is really affected. Thanks for the report so far.
Oracle Corp.
martyscholes
Posts: 202
Joined: 11. Sep 2011, 00:24
Primary OS: Solaris
VBox Version: VirtualBox+Oracle ExtPack
Guest OSses: Win 7, Ubuntu, Win XP, Vista, Win 8, Mint, Pear, Several Linux Virtual Appliances

Re: How to crash Solaris host with Windows 7 guest

Post by martyscholes »

Ramshankar,

This bit me again last night. I am not asking for any intervention; I am just letting you know. I do not know which VMs were running at the time or which VMs caused this, but I woke up to an unresponsive server. From 02:18 on I had hundreds of messages like the following in /var/adm/messages.

Aug 1 02:18:31 dl585 last message repeated 1258 times
Aug 1 02:18:31 dl585 vboxdrv: [ID 914993 kern.notice] int pgmR0BthAMD64AMD64Trap0eHandler(VMCPU*, RTGCUINT, CPUMCTXCORE*, RTGCPTR, bool*): returns rc=0 pvFault=0000000003f0ad55 uErr=15 cs:rip=0023:03f0ad55
Ramshankar
Oracle Corporation
Posts: 793
Joined: 7. Jan 2008, 16:17

Re: How to crash Solaris host with Windows 7 guest

Post by Ramshankar »

Hm, sorry... hard to say what is exactly going wrong at the moment without examining what the guest is actually doing, the uErr=15 (implying lower 4 bits of the page fault error code is set) looks suspicious...
Oracle Corp.
martyscholes
Posts: 202
Joined: 11. Sep 2011, 00:24
Primary OS: Solaris
VBox Version: VirtualBox+Oracle ExtPack
Guest OSses: Win 7, Ubuntu, Win XP, Vista, Win 8, Mint, Pear, Several Linux Virtual Appliances

Re: How to crash Solaris host with Windows 7 guest

Post by martyscholes »

Ramshankar,

Recently, this bug has been plaguing me more and more, crashing the host several times yesterday and once this morning. It seems somehow related to network activity in the guests, but I cannot be sure.

Code: Select all

Aug 22 09:22:43 dl585 last message repeated 97 times
Aug 22 09:22:43 dl585 vboxdrv: [ID 908015 kern.notice] int pgmR0BthAMD64AMD64Trap0eHandler(VMCPU*, RTGCUINT, CPUMCTXCORE*, RTGCPTR, bool*): returns rc=0 pvFault=0000000004831366 uErr=15 cs:rip=0023
:04831366
Aug 22 09:22:43 dl585 last message repeated 147 times
Aug 22 09:22:43 dl585 vboxdrv: [ID 359555 kern.notice] int pgmR0BthAMD64AMD64Trap0eHandler(VMCPU*, RTGCUINT, CPUMCTXCORE*, RTGCPTR, bool*): returns rc=0 pvFault=0000000004834f3d uErr=15 cs:rip=0023
:04834f3d
Aug 22 09:22:43 dl585 last message repeated 15 times
Aug 22 09:22:43 dl585 vboxdrv: [ID 192783 kern.notice] int pgmR0BthAMD64AMD64Trap0eHandler(VMCPU*, RTGCUINT, CPUMCTXCORE*, RTGCPTR, bool*): returns rc=0 pvFault=000000000482fadb uErr=15 cs:rip=0023
:0482fadb
Aug 22 09:22:43 dl585 last message repeated 46 times
Aug 22 09:22:43 dl585 vboxdrv: [ID 891002 kern.notice] int pgmR0BthAMD64AMD64Trap0eHandler(VMCPU*, RTGCUINT, CPUMCTXCORE*, RTGCPTR, bool*): returns rc=0 pvFault=000000000676e088 uErr=15 cs:rip=0023
:0676e088
Aug 22 09:22:43 dl585 last message repeated 21 times
Aug 22 09:22:43 dl585 vboxdrv: [ID 646807 kern.notice] int pgmR0BthAMD64AMD64Trap0eHandler(VMCPU*, RTGCUINT, CPUMCTXCORE*, RTGCPTR, bool*): returns rc=0 pvFault=000000000676e0ca uErr=15 cs:rip=0023:0676e0ca
Aug 22 09:22:43 dl585 last message repeated 126 times
Aug 22 09:22:43 dl585 vboxdrv: [ID 379118 kern.notice] int pgmR0BthAMD64AMD64Trap0eHandler(VMCPU*, RTGCUINT, CPUMCTXCORE*, RTGCPTR, bool*): returns rc=0 pvFault=000000000676eca2 uErr=15 cs:rip=0023:0676eca2
Aug 22 09:22:43 dl585 last message repeated 3 times
Aug 22 09:22:43 dl585 vboxdrv: [ID 476981 kern.notice] int pgmR0BthAMD64AMD64Trap0eHandler(VMCPU*, RTGCUINT, CPUMCTXCORE*, RTGCPTR, bool*): returns rc=0 pvFault=0000000005cb3a17 uErr=15 cs:rip=0023:05cb3a17
Aug 22 09:22:43 dl585 vboxdrv: [ID 788682 kern.notice] int pgmR0BthAMD64AMD64Trap0eHandler(VMCPU*, RTGCUINT, CPUMCTXCORE*, RTGCPTR, bool*): returns rc=0 pvFault=000000000676f125 uErr=15 cs:rip=0023:0676f125
Aug 22 09:22:43 dl585 last message repeated 81 times
Aug 22 09:29:00 dl585 genunix: [ID 540533 kern.notice] ^MSunOS Release 5.11 Version 11.1 64-bit
Aug 22 09:29:00 dl585 genunix: [ID 459285 kern.notice] Copyright (c) 1983, 2012, Oracle and/or its affiliates. All rights reserved.
This morning it crashed and rebooted, but a soft reboot makes the machine unusable owing to a 3D card that must be powered up cold (or else gdm and SRSS come up flaky). At any rate, these errors are coming from the vboxdrv kernel module. Is there any way to troubleshoot or at least avoid this? I cannot operate this way. The above errors are from 4.2.14 but after the reboot I upgraded to 4.2.16 even though I see nothing in the Changelog which suggests this might have been fixed. I cannot even find pgmR0BthAMD64AMD64Trap0eHandler in the sources.

I see this issue has been discussed for other platforms as well. Is there anything I can do here? We are not using the 3D card, so we can remove that if it will help troubleshoot the issue. Is there any DTrace voodoo I can do?

Many thanks,
Marty
Ramshankar
Oracle Corporation
Posts: 793
Joined: 7. Jan 2008, 16:17

Re: How to crash Solaris host with Windows 7 guest

Post by Ramshankar »

I cannot even find pgmR0BthAMD64AMD64Trap0eHandler in the sources.
That's unfortunately mangled heavily with the use of macros. It's in PGMAllBth.h, search for

Code: Select all

PGM_BTH_DECL(int, Trap0eHandler)
in the sources and you should be able to locate it.

The source of the crash however, is that logging from ring-0 causes unexpected preemption that kills the host. cmn_err() on Solaris preempts. It's very likely that we don't get preempted most of the time, but if we're unlucky enough we will and then that's going to cause problems.

I have a fix in trunk (4.3.x) to avoid logging this, i'll see if I've backported it to 4.2.x and if so that should be available in the next maintenance release.

Update: I've backported a potential fix and should be available with 4.2.18 i.e. the next maintenance release.
Oracle Corp.
martyscholes
Posts: 202
Joined: 11. Sep 2011, 00:24
Primary OS: Solaris
VBox Version: VirtualBox+Oracle ExtPack
Guest OSses: Win 7, Ubuntu, Win XP, Vista, Win 8, Mint, Pear, Several Linux Virtual Appliances

Re: How to crash Solaris host with Windows 7 guest

Post by martyscholes »

Ramshankar wrote:
I cannot even find pgmR0BthAMD64AMD64Trap0eHandler in the sources.
That's unfortunately mangled heavily with the use of macros. It's in PGMAllBth.h, search for

Code: Select all

PGM_BTH_DECL(int, Trap0eHandler)
in the sources and you should be able to locate it.

The source of the crash however, is that logging from ring-0 causes unexpected preemption that kills the host. cmn_err() on Solaris preempts. It's very likely that we don't get preempted most of the time, but if we're unlucky enough we will and then that's going to cause problems.

I have a fix in trunk (4.3.x) to avoid logging this, i'll see if I've backported it to 4.2.x and if so that should be available in the next maintenance release.

Update: I've backported a potential fix and should be available with 4.2.18 i.e. the next maintenance release.
Many thanks. So the issue is that logging from ring-0 can result in confusion, right? Interesting. I would never have guessed. After an upgrade to 4.2.16 this morning, I took another crash at 10:39. I am anxiously awaiting 4.2.18.

Thanks again,
Marty
Ramshankar
Oracle Corporation
Posts: 793
Joined: 7. Jan 2008, 16:17

Re: How to crash Solaris host with Windows 7 guest

Post by Ramshankar »

That is correct, we're calling into the OS API which doesn't respect the 'do not preempt thread' request. Our code following this will still assume we're running on the expected CPU when we could be on any. Once this happens lots of things can go wrong and most likely will.
Oracle Corp.
martyscholes
Posts: 202
Joined: 11. Sep 2011, 00:24
Primary OS: Solaris
VBox Version: VirtualBox+Oracle ExtPack
Guest OSses: Win 7, Ubuntu, Win XP, Vista, Win 8, Mint, Pear, Several Linux Virtual Appliances

Re: How to crash Solaris host with Windows 7 guest

Post by martyscholes »

Ramshankar,

Would you be interested in sharing an early binary of the fixed vboxdrv module?
martyscholes
Posts: 202
Joined: 11. Sep 2011, 00:24
Primary OS: Solaris
VBox Version: VirtualBox+Oracle ExtPack
Guest OSses: Win 7, Ubuntu, Win XP, Vista, Win 8, Mint, Pear, Several Linux Virtual Appliances

Re: How to crash Solaris host with Windows 7 guest

Post by martyscholes »

Ramshankar wrote:That is correct, we're calling into the OS API which doesn't respect the 'do not preempt thread' request. Our code following this will still assume we're running on the expected CPU when we could be on any. Once this happens lots of things can go wrong and most likely will.
I wonder if this is related to another issue. The server we use is 4x dual-core Opteron. We learned early on that we could crash / freeze the host if we started a guest with more than 2 CPUs.
Ramshankar
Oracle Corporation
Posts: 793
Joined: 7. Jan 2008, 16:17

Re: How to crash Solaris host with Windows 7 guest

Post by Ramshankar »

martyscholes wrote:Ramshankar,

Would you be interested in sharing an early binary of the fixed vboxdrv module?
I'm not sure the change only ended up in vboxdrv. It was also the Runtime. I'll try to put out a test build if the 4.2.18 release might take a while.
Oracle Corp.
Post Reply