VirtualBox v3.0.X system lock up issue

Discussions related to using VirtualBox on Solaris hosts.
Post Reply
thomberg
Posts: 7
Joined: 30. Aug 2009, 22:57
Primary OS: Solaris
VBox Version: PUEL
Guest OSses: Ubuntu 9,04 64-bit, Windows XP SP3

Re: VirtualBox v3.0.X system lock up issue

Post by thomberg »

ci4ic4 wrote:Go get your next 3.06BETA1 fix from http://download.virtualbox.org/virtualbox/3.0.6_BETA1/ . So far seems to be working for me. More tests needed, though. I am just about to replace the [fine working] 2.2.4 with this on one of my OpenSolaris snv_121 based servers (64-bit with AMD-V).
I just tried the beta and I'm still getting full host freezes on Solaris 10 U7 :(
dri
Posts: 12
Joined: 31. Jul 2009, 13:58
Primary OS: Ubuntu other
VBox Version: OSE Debian
Guest OSses: Various

Re: VirtualBox v3.0.X system lock up issue

Post by dri »

Latest Beta seems to be working. :) I get kernel panics in a Jaunty guest though. e1000 is the culprit apparently, different problem I guess.

EDIT: I had it running overnight with a couple of Linux VM's doing various repetitive tasks, forcing high load on the host, besides the e1000 kernel panic, everything seems fine.

EDIT #2: System frooze after a day. I'm using bridged networking on a AMD-V, haven't tried NAT but not really interested in using it so I reverted to 2.2.4 as I need a stable environment. Keep the good work up, almost felt like you had it! :)
Last edited by dri on 6. Sep 2009, 03:18, edited 2 times in total.
alzmich
Posts: 4
Joined: 30. Aug 2009, 22:41
Primary OS: OpenSolaris 11
VBox Version: PUEL
Guest OSses: Win XP, Linux

Re: VirtualBox v3.0.X system lock up issue

Post by alzmich »

Hi,
ci4ic4 wrote:Go get your next 3.06BETA1 fix from http://download.virtualbox.org/virtualbox/3.0.6_BETA1/ . So far seems to be working for me. More tests needed, though. I am just about to replace the [fine working] 2.2.4 with this on one of my OpenSolaris snv_121 based servers (64-bit with AMD-V).
tried it this night - after about 5 hours working (raytracing - so cpu intensive) of the xp64-guest the total hosts lock's again as the versions 3.0.x before - unfortunately.

Michael.
Ramshankar
Oracle Corporation
Posts: 793
Joined: 7. Jan 2008, 16:17

Re: VirtualBox v3.0.X system lock up issue

Post by Ramshankar »

alzmich wrote:Hi,
ci4ic4 wrote:Go get your next 3.06BETA1 fix from http://download.virtualbox.org/virtualbox/3.0.6_BETA1/ . So far seems to be working for me. More tests needed, though. I am just about to replace the [fine working] 2.2.4 with this on one of my OpenSolaris snv_121 based servers (64-bit with AMD-V).
tried it this night - after about 5 hours working (raytracing - so cpu intensive) of the xp64-guest the total hosts lock's again as the versions 3.0.x before - unfortunately.

Michael.
This was bridged networking again? Could you try with NAT. Because today we just fixed a deadlock that would be trigged in bridged networking.
Oracle Corp.
alzmich
Posts: 4
Joined: 30. Aug 2009, 22:41
Primary OS: OpenSolaris 11
VBox Version: PUEL
Guest OSses: Win XP, Linux

Re: VirtualBox v3.0.X system lock up issue

Post by alzmich »

Ramshankar wrote:
alzmich wrote:Hi,

tried it this night - after about 5 hours working (raytracing - so cpu intensive) of the xp64-guest the total hosts lock's again as the versions 3.0.x before - unfortunately.

Michael.
This was bridged networking again? Could you try with NAT. Because today we just fixed a deadlock that would be trigged in bridged networking.
Hi,

thanks for the hint - i tested it again, this time with only one NAT-network configured.
Sadly no difference detectable. The xp64 guest locks the complete host the same way as before.

Michael.
thomberg
Posts: 7
Joined: 30. Aug 2009, 22:57
Primary OS: Solaris
VBox Version: PUEL
Guest OSses: Ubuntu 9,04 64-bit, Windows XP SP3

Re: VirtualBox v3.0.X system lock up issue

Post by thomberg »

After a lot of testing with the beta it definitely looks like the bridged networking is the main culprit on my Solaris 10 U7 host setup.
Running a single Ubuntu 64-bit guest continuously copying files from a NFS server, the host froze in less than 5 minutes using bridged networking but ran for 2.5 hours without problems using NAT before I stopped it.

While running the NATed Ubuntu guest copying files via NFS I booted a WinXP 32 bit guest with bridged networking and started copying files via CIFS/Samba from the NFS server used byt the Linux guest. The host froze almost instantly.
Running both guests with NAT seemed to work stably. I did have some issues with the WinXP guest but I don't think they were Virtualbox related.

I also tried maxing the cpu on the Windows guest using Prime95 while copying via nfs on the Ubuntu guest. That also ran stably while using NAT. I only ran it for about 2 hours though.

Keep up the good detective work :-)
localhost66
Posts: 5
Joined: 18. Jun 2008, 15:31
Primary OS: OpenSolaris 11
VBox Version: PUEL
Guest OSses: XP SP3, Ubuntu 8.04, 2003x64

Re: VirtualBox v3.0.X system lock up issue

Post by localhost66 »

localhost66 wrote:I tried to play with ZFS ARC settings. And after drastically degradation to 256M it stops locking up.

Code: Select all

set zfs:zfs_arc_max = 0x10000000
This limitation is needed also for running XEN Dom0 with ZFS.
http://opensolaris.org/os/community/xen/docs/relnotes/

I hope, there will be better solution, than almost destroing ZFS Cache. I rather moved back to 2.2.4...

Jiri Madle
It's working for week without Host lock up. : SNV111, VB 3.0.4 Guest XP-SP3, Bridged Networking.
When I increase ZFS ARC, it will lock up in 2-3 hours. If I generate a lot of IO to guest-disk, it locks up in 10 minutes. It could be defragmentation or file copying to shared folder via Vbox or SMB or any you want. I'm not sure, but it seems to me, it could be something with ZFS caching. Unfortunately I don't have time and free box to test it with guest-disk stored on UFS. But reducing ZFS ARC helped...
Jiri
thomberg
Posts: 7
Joined: 30. Aug 2009, 22:57
Primary OS: Solaris
VBox Version: PUEL
Guest OSses: Ubuntu 9,04 64-bit, Windows XP SP3

Re: VirtualBox v3.0.X system lock up issue

Post by thomberg »

localhost66 wrote:
localhost66 wrote:I tried to play with ZFS ARC settings. And after drastically degradation to 256M it stops locking up.

Code: Select all

set zfs:zfs_arc_max = 0x10000000
This limitation is needed also for running XEN Dom0 with ZFS.
http://opensolaris.org/os/community/xen/docs/relnotes/

I hope, there will be better solution, than almost destroing ZFS Cache. I rather moved back to 2.2.4...

Jiri Madle
It's working for week without Host lock up. : SNV111, VB 3.0.4 Guest XP-SP3, Bridged Networking.
When I increase ZFS ARC, it will lock up in 2-3 hours. If I generate a lot of IO to guest-disk, it locks up in 10 minutes. It could be defragmentation or file copying to shared folder via Vbox or SMB or any you want. I'm not sure, but it seems to me, it could be something with ZFS caching. Unfortunately I don't have time and free box to test it with guest-disk stored on UFS. But reducing ZFS ARC helped...
Jiri
I tried setting zfs_arc_max to both 256MB and 512MB on my Solaris 10 U7 box and I'm still full host lock ups when using Brigded networking with my Ubuntu64 and WinXP SP3 guests. Hangs in a few minutes when generating network and disk I/O. This was using the 3.0.6 beta. Using NAT is so far the only thing that has worked for me.

Thomas
frank
Oracle Corporation
Posts: 3362
Joined: 7. Jun 2007, 09:11
Primary OS: Debian Sid
VBox Version: PUEL
Guest OSses: Linux, Windows
Location: Dresden, Germany
Contact:

Re: VirtualBox v3.0.X system lock up issue

Post by frank »

We fixed a potential deadlock on Solaris hosts when using bridged networking after the 3.0.6 Beta 1 was released so it is likely that your problem is fixed now.
mattjk
Posts: 2
Joined: 9. Sep 2009, 14:50
Primary OS: OpenSolaris 11
VBox Version: PUEL
Guest OSses: W2K8, W7

Re: VirtualBox v3.0.X system lock up issue

Post by mattjk »

Frank Mehnert wrote:We fixed a potential deadlock on Solaris hosts when using bridged networking after the 3.0.6 Beta 1 was released so it is likely that your problem is fixed now.
Any ETA on 3.0.6 final/beta 2/some new release with this fix in it?
half12
Posts: 110
Joined: 26. May 2008, 19:46
Primary OS: OpenSolaris other
VBox Version: PUEL
Guest OSses: RH 4 & 5, CentOS 4, Ubuntu 9.10, MSDOS, Win 95, 98se, 2K, XP, OpenSolaris, Solaris

Re: VirtualBox v3.0.6 lock up with high network traffic

Post by half12 »

Hi,

Installed VirtualBox 3.0.6 on my dual opteron system running Solaris 10 u5 and run a variety of Vbox, WinXP, SXCE b121 and not had any crashes or lock ups after nearly two hours.
The updated AdditionsCD works correctly with SXCE b121 so that mouse interaction is seamless.

Update
Solaris Host : Dual Opteron 2218, 16GB RAM, 850GB sata disk, Supermicro H8DAE-2 motherboard
Initially my SXCE b121 VBox was configured with 2 cpus but I noticed that this was significantly slower than running under v2.2.4 with a single cpu. Reduced the number for cpus from 2 to 1 and restarted SXCE b121 and found that performance improved significantly. Have been running WinXP and SXCE b121 for around 5 hours, now started RedHat 4 the system is stable. Each of the VBox is configured to use Bridging. Please note that I have USB disabled in all VBox.
I have been playing tv and radio content from the BBC as well as this video http://www.youtube.com/watch?v=uGjiKrMO-7I
So far, everything is working well.

Update2
Reproduced system lock up!!
When trying to transfer around 10GB of files from Host to VBox client (both CentOS 5 and SXCE b121) the transfer rate using scp became slower and slower. Killed CentOS 5 transfer and tried with SXCE b121 this reached around 90% and then all VBox locked up but system still responsive, could switch between workspaces then eventually it froze and everything stopped.
Last edited by half12 on 10. Sep 2009, 22:19, edited 3 times in total.
arantala
Posts: 4
Joined: 8. Sep 2009, 19:26
Primary OS: OpenSolaris 11
VBox Version: PUEL
Guest OSses: FreeBSD (pfSense), CentOS 5

Re: VirtualBox v3.0.X system lock up issue

Post by arantala »

I am currently running 3.0.4 on top of snv_122 running on AMD 5050e. I temporarily upgraded to 3.0.6, but that only exacerbated the hanging issue instead of fixing it.

I have two virtual machines:
1) FreeBSD based pfSense firewall using bridged networking with three physical intel network cards. (kern.hz="100", no guest additions)
2) CentOS 5 based generic Linux server bridged to a fourth physical network card. (divider=10 kernel option + VBox Guest additions)

In 3.0.4 if I run only one of these virtual machines, everything is stable. If I run both at the same time, the following happens:
If the CentOS box is fairly idle, any heavy network traffic through the firewall can hang the Solaris host entirely after a few minutes or hours.
If the firewall box is fairly idle, copying 8 gigs of data from the CentOS box over samba to my desktop computer hangs the host system after a few minutes. The copying does not go through the firewall.
As I said, if one of the VMs is shut down, no hang occurs, ever.

Disabling or enabling HW virtualization, IO APIC, or any other VM features has no effect on the stability.

I was happy to hear that a hanging bug had been fixed after 3.0.6 Beta 1, so I installed 3.0.6 yesterday.

After doing so, I upgraded the Guest Additions on the CentOS box and started the same 8 GB copy operation with both VMs running.

The host hung after only a few seconds of transfer.

After rebooting, disappointed, I shut down the CentOS box and ran only the firewall, as I had done before. In a few hours the host hung again, even though it had been stable in 3.0.4 with only one machine running.

I downgraded to 3.0.4 until I have time to do more testing and for now, when only running the firewall, the host has stayed up.

To summarize, 3.0.6 did NOT fix the hanging bug that I am experiencing. It made it worse.

I can not find any apparently relevant information in logs. How can I help to diagnose this hanging issue further?

UPDATE: It hung today on also 3.0.4 with only the firewall running. At least I'm fairly sure that only the firewall was running. :/
Last edited by arantala on 12. Sep 2009, 00:52, edited 1 time in total.
thomberg
Posts: 7
Joined: 30. Aug 2009, 22:57
Primary OS: Solaris
VBox Version: PUEL
Guest OSses: Ubuntu 9,04 64-bit, Windows XP SP3

Re: VirtualBox v3.0.X system lock up issue

Post by thomberg »

I just tried 3.0.6 final on my Solaris 10 U7 box and ran my normal tests with bridged networking. Still getting full host freeze after a few minutes.

I'll also gladly do some extra testing if you have something in mind.

Thomas
localhost66
Posts: 5
Joined: 18. Jun 2008, 15:31
Primary OS: OpenSolaris 11
VBox Version: PUEL
Guest OSses: XP SP3, Ubuntu 8.04, 2003x64

Re: VirtualBox v3.0.6 lock up with high network traffic

Post by localhost66 »

half12 wrote:
Update2
Reproduced system lock up!!
When trying to transfer around 10GB of files from Host to VBox client (both CentOS 5 and SXCE b121) the transfer rate using scp became slower and slower. Killed CentOS 5 transfer and tried with SXCE b121 this reached around 90% and then all VBox locked up but system still responsive, could switch between workspaces then eventually it froze and everything stopped.
I had the same problem, even with 2.2.4. It is stepwise slowing and stoping. First Gnome, ssh + RDP works, than stops Samba, than RDR, ssh works only few seconds more and than everything is stoped except ping to host.

I think, we are mixing two or more problems together.
My history of lock-up problems:
1st BOX - Intel VT, OSOLsnv86, 8GB RAM, Guest XP SP3, bridged network 1 NIC
1. I upgraded 2.1.4 to 3.0.2 - 2.1.4. was almost stable for me, only rare guest down (1-2x in month)
2. 3.0.2 - It lock up Host - first it was happen when I copied 15G form Guest to Host.
3. 3.0.4 - It lock up Host - same problem, I assumed it's something about Memory/Cache/Swap problem (my erlier posts) So I set zfs arc to 256MB
4. 3.0.4 - It is working fine for a week. But after a week it locks up after I only conected into Guest via RDP (to Guest IP)
---- all locks up was absolute - in second everything stops, no ping , no Numlock:-)
5. Downgrade to 2.2.4 + enabling auto ZFS ARC - after massive disk IO (Guest disk via bridged nic + samba directly to host via Host's NIC) it is slowing and stoping as above... but ping worked
6. stay at 2.2.4 + ZFS ARC to 256M - I did same IO tests as before - RDP, Guest->Host via network, SMB, and it survived 3 hours. Now it's working for 2 days.

2nd BOX - OSOLsnv111b, AMD opt, AMD-V+Nest.Pag.,8GB RAM, Guest XP SP3, Ubuntu 8.0.4, Bridget network, each guest - 1NIC
1. I started with 3.0.2 and than upgraded to 3.0.4 - It locks up in few seconds after some network IO - even RDP..
2. I downgrade to 2.2.4 - it worked 3 days, but If I tried to configure backuping from guest to Host's disk via network, (about 20G) it starts slowing and stoping - as 5. above
3. I stepwise decreased ZFS ARC. After I set it to 768M, It started to look like stable. Now it's working for 2 days. I did intensive testing for 2 hours.

I think that bridged network problem in my case caused absolute lock-up - no ping, inactive keyboard(numlock...).
And this could be another problem. It helped to me to lower ZFS ARC - maybe problem isn't with ZFS caching+VBOX, maybe it's anything absolute else and it only helped because ZFS ARC freed memory.

Jiri Madle
vineethrp
Posts: 1
Joined: 8. Sep 2009, 15:01
Primary OS: OpenSolaris 11
VBox Version: PUEL
Guest OSses: XP, Debian, helenOS, FreeBSD, Win7

Re: VirtualBox v3.0.X system lock up issue

Post by vineethrp »

Hello all,

I am hitting this problem consistently from 3.0 till 3.0.6 final version. I need not do any i/o as such. I just leave the Virtual machine running overnight and next day my host is frozen. My am using opensolaris b122 and I started hitting this issue back from opensolaris b111b onwards. I hit this issue in 12 hours if the guest is Linux(I am running debian). With windows guest it might take upto 48 hours to reproduce..

This is a very serious problem. After every new release, I am regularly updating and then again downgrading to 2.2.4.

The sad thing is that I am unable to collect any debug data. I tried deadman timer and even deadman timer is not triggering. Som thread with highest priority is hung or looping :-(

It would be great if we good nail the suspect ASAP!
Post Reply