ul 8 09:32:26 vserver kernel: [148595.307485] INFO: rcu_sched detected stalls on CPUs/tasks: { 3} (detected by 2, t=5252 jiffies, g=405431, c=405430, q=11708)
Jul 8 09:32:26 vserver kernel: [148595.307504] sending NMI to all CPUs:
You say it happens shortly after the joomla guest starts so does it happen if you start the other guest instead? This would isolate it to the joomla guest as the cause but don't jump to the conclusion that is is the real reason yet.
The lockup typically isn't "shortly" after starting joomla guest, it normally takes about 6 hours. (I think I put that in my original post, if not mea culpa).
Haven't tried the other guest yet since it does very little, but I can try that too.
More later, getting called away. Tech support never sleeps........
This is expected as VirtualBox is seeing the same thing as the host. The fix is to figure out why and fix that.
Do you have an IT department and have they been involved with this?
Well, among other things, I am the IT Department.
And the technical writer.
And the trainer.
Dispatcher.
Customer service manager.
Cable TV repairman.
Basically I get to work on all the stuff nobody else can/will work on. That's why I'm in & out right now - trying to figure out why our A/V system has no audio.
Pop wrote:trying to figure out why our A/V system has no audio.
That would make it the <null>/V system, wouldn't it?
Sorry, couldn't resist...
Do NOT send me Personal Messages (PMs) for troubleshooting, they are simply deleted.
Do NOT reply with the "QUOTE" button, please use the "POST REPLY", at the bottom of the form.
If you obfuscate any information requested, I will obfuscate my response. These are virtual UUIDs, not real ones.
I had another look at the log file and something triggered. Search for proliant site:virtualbox.org and this is the first thing to popped up viewtopic.php?f=7&t=67391
Seems there may be an issue with the proliant itself. Try the work around mentioned and see if it helps.
so, should I blacklist hpdwt AND add the nmi_watchdog kernel parm?
At the moment everything is still running, but I can make the changes in anticipation of the next freeze-up.
Also, the reason I disabled hyperthreading - Old threads that discussed similar lockups suggested that hyperthreading might be the culprit. I can re-enable during the next reboot. Thanks!
so, should I blacklist hpdwt AND add the nmi_watchdog kernel parm?
That is what I would do since it appears to be the real issue. I would also enable the hyper-threads since it will help the host, unless the host is not used for anything else at all, but it looks like you are using it for other things as well.
I would like to help you with your situation, I don't know if you have managed to solve it yet, but here goes, one of the things that you say is that after six hours first the guest freezes then the host requiring a hard reset, could it actually be that your host box is overheating which is causing the system to become unstable, this is actually a wild shot in the dark but maybe worth looking into.
Nijee - I thought that might be the case too, so the fan controls are set for max cooling, and I increased the airflow into the server room. Though still warmer than I prefer, I'm confident that heat isn't the issue.
Perryg - It locked up over the weekend, and I just restarted with a power cycle. Didn't turn on hyperthreading, but hpdwt is now blacklisted and watchdog is on. Now we wait.........
Okay, Joomla guest is still running, but has lost its network access. Wont' respond to http requests or pings. Taking it off line to prevent further issues.
Could the missing d-bus package be the issue? This is the last few lines of the joomla guest log file:
*************************************************************************************************************************************************************
00:00:30.457421 VMMDev: Guest Log: 00:00:00.042574 vminfo Error: Unable to connect to system D-Bus (1/3): D-Bus not installed
00:00:30.462897 GUI: UISession::sltAdditionsChange: GA state really changed, notifying listeners.
00:00:30.463064 GUI: UIMachineViewNormal::adjustGuestScreenSize: Adjust guest-screen size if necessary.
00:00:30.463077 GUI: UISession::sltAdditionsChange: GA state change event came, notifying listeners.
00:00:30.463083 GUI: UIMachineLogicNormal::sltCheckForRequestedVisualStateType: Requested-state=0, Machine-state=5
00:00:35.465747 VMMDev: Guest Log: 00:00:05.053047 vminfo Error: Unable to connect to system D-Bus (2/3): D-Bus not installed
00:00:40.464470 VMMDev: Guest Log: 00:00:10.054170 vminfo Error: Unable to connect to system D-Bus (3/3): D-Bus not installed
*************************************************************************************************************************************************************
Installing now on all 3 guests and the host.
I believe this to be a host issue, with a little hardware issue thrown in as well. You probably should raise a ticket at bugtracker and get the DEVs involved. It has already been established that the HP proliant does produce issues and for someone that does not have access to one I can not confirm or deny but others have and you might search for HP proliant site:virtualbox.org and see if you can find a resolve.
It's starting to look like the installation of the dbus package on my joomla and suitecrm guests solved my problem - I'm approaching 48 hours without a network communication problem, guest freezup, or host lockup.
I'm declaring the installation of the dbus package the solution to this issue. The server has never run for this long (>55 hours) without locking up before.