Page 1 of 2

Linux guest VM becomes "stuck" after a period of time.

Posted: 31. May 2016, 07:31
by Presence
Technical Details for hardware and OS at the bottom of the message:

I have Virtualbox 5.0.20r106931 with the extensions running headless on Ubuntu x86_64 version 14.0.4 with kernel 3.16.0-70-generic.

One of the guests (alienvault all in one) Im running becomes stuck regularly. I have allocated 16GB ram, 4 CPUs (tried 2 as well, same issue). Other Guests (2 other active guests, 1 with 1 CPU allocated and 4GB ram, and the other is the alienvault sensor (same kernel as the AOI) have zero issues with getting stuck.

Any ideas at all? I have changed every processor option, disabled all non-essential components aside from USB, but it doesn't stop the problem.

================================= %< Details >%==================================================================================

Code: Select all

OS: Ubuntu 64 bit 14.0.4 - Linux vmhost 3.16.0-70-generic #90~14.04.1-Ubuntu SMP Wed Apr 6 22:56:34 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Hardware:
H/W path         Device      Class      Description
===================================================
                             system     ProLiant DL380 G5
/0                           bus        Motherboard
/0/0                         memory     64KiB BIOS
/0/400                       processor  Intel(R) Xeon(R) CPU           E5345  @ 2.33GHz
/0/400/710                   memory     128KiB L1 cache
/0/400/720                   memory     8MiB L2 cache
/0/406                       processor  Intel(R) Xeon(R) CPU           E5345  @ 2.33GHz
/0/406/716                   memory     128KiB L1 cache
/0/406/726                   memory     8MiB L2 cache
/0/1000                      memory     64GiB System Memory
/0/1000/0                    memory     8GiB FB-DIMM DDR2 FB-DIMM Synchronous 667 MHz (1.5 ns)
/0/1000/1                    memory     8GiB FB-DIMM DDR2 FB-DIMM Synchronous 667 MHz (1.5 ns)
/0/1000/2                    memory     8GiB FB-DIMM DDR2 FB-DIMM Synchronous 667 MHz (1.5 ns)
/0/1000/3                    memory     8GiB FB-DIMM DDR2 FB-DIMM Synchronous 667 MHz (1.5 ns)
/0/1000/4                    memory     8GiB FB-DIMM DDR2 FB-DIMM Synchronous 667 MHz (1.5 ns)
/0/1000/5                    memory     8GiB FB-DIMM DDR2 FB-DIMM Synchronous 667 MHz (1.5 ns)
/0/1000/6                    memory     8GiB FB-DIMM DDR2 FB-DIMM Synchronous 667 MHz (1.5 ns)
/0/1000/7                    memory     8GiB FB-DIMM DDR2 FB-DIMM Synchronous 667 MHz (1.5 ns)
/0/100                       bridge     5000P Chipset Memory Controller Hub
/0/100/2                     bridge     5000 Series Chipset PCI Express x4 Port 2
/0/100/2/0                   bridge     6311ESB/6321ESB PCI Express Upstream Port
/0/100/2/0/0                 bridge     6311ESB/6321ESB PCI Express Downstream Port E1
/0/100/2/0/1                 bridge     6311ESB/6321ESB PCI Express Downstream Port E2
/0/100/2/0/2                 bridge     6311ESB/6321ESB PCI Express Downstream Port E3
/0/100/2/0.3                 bridge     6311ESB/6321ESB PCI Express to PCI-X Bridge
/0/100/3                     bridge     5000 Series Chipset PCI Express x4 Port 3
/0/100/3/0       scsi2       storage    Smart Array Controller
/0/100/4                     bridge     5000 Series Chipset PCI Express x8 Port 4-5
/0/100/5                     bridge     5000 Series Chipset PCI Express x4 Port 5
/0/100/6                     bridge     5000 Series Chipset PCI Express x8 Port 6-7
/0/100/7                     bridge     5000 Series Chipset PCI Express x4 Port 7
/0/100/1c                    bridge     631xESB/632xESB/3100 Chipset PCI Express Root Port 1
/0/100/1c/0                  bridge     EPB PCI-Express to PCI-X Bridge
/0/100/1c/0/0    eth1        network    NetXtreme II BCM5708 Gigabit Ethernet
/0/100/1c.1                  bridge     631xESB/632xESB/3100 Chipset PCI Express Root Port 2
/0/100/1c.1/0                bridge     EPB PCI-Express to PCI-X Bridge
/0/100/1c.1/0/0  eth0        network    NetXtreme II BCM5708 Gigabit Ethernet
/0/100/1d                    bus        631xESB/632xESB/3100 Chipset UHCI USB Controller #1
/0/100/1d.1                  bus        631xESB/632xESB/3100 Chipset UHCI USB Controller #2
/0/100/1d.2                  bus        631xESB/632xESB/3100 Chipset UHCI USB Controller #3
/0/100/1d.3                  bus        631xESB/632xESB/3100 Chipset UHCI USB Controller #4
/0/100/1d.7                  bus        631xESB/632xESB/3100 Chipset EHCI USB2 Controller
/0/100/1e                    bridge     82801 PCI Bridge
/0/100/1e/3                  display    ES1000
/0/100/1e/4                  generic    Integrated Lights Out Controller
/0/100/1e/4.2                generic    Integrated Lights Out  Processor
/0/100/1e/4.4                bus        Integrated Lights-Out Standard Virtual USB Controller
/0/100/1e/4.6                bus        Integrated Lights-Out Standard KCS Interface
/0/100/1f                    bridge     631xESB/632xESB/3100 Chipset LPC Interface Controller
/0/100/1f.1                  storage    631xESB/632xESB IDE Controller
/0/101                       bridge     5000 Series Chipset FSB Registers
/0/102                       bridge     5000 Series Chipset FSB Registers
/0/103                       bridge     5000 Series Chipset FSB Registers
/0/104                       bridge     5000 Series Chipset Reserved Registers
/0/105                       bridge     5000 Series Chipset Reserved Registers
/0/106                       bridge     5000 Series Chipset FBD Registers
/0/107                       bridge     5000 Series Chipset FBD Registers
/0/1             scsi0       storage    
/0/1/0.0.0       /dev/cdrom  disk       DW-224E-V
/1               eth2        network    Ethernet interface
/2               eth3        network    Ethernet interface

Re: Linux guest VM becomes "stuck" after a period of time.

Posted: 31. May 2016, 13:49
by Perryg
Post the guests log file ( as an attachment ). Right click on the guest in the Main Manager then click show log. Save and post as an attachment. Compress if it is too large to post.

Also is there a pattern to the time it take? How long does it work?

Re: Linux guest VM becomes "stuck" after a period of time.

Posted: 31. May 2016, 16:47
by Presence
The VM gets "stuck" soon after the system is accessible via the web, but aside from it happening every time, that is the only relatable element that I can speak to.

Re: Linux guest VM becomes "stuck" after a period of time.

Posted: 31. May 2016, 17:08
by Perryg

Code: Select all

00:01:36.489576 VRDP: Logoff: HELLFIRE (192.168.1.11) build 10586. User: [] Domain: [] Reason 0x0000.
00:01:36.489692 VRDP: Connection closed: 1
00:01:36.489772 VBVA: VRDP acceleration has been disabled.
00:20:39.443341 
00:20:39.443345 !!R0-Assertion Failed!!
00:20:39.443346 Expression: RT_SUCCESS_NP(rc)
00:20:39.443347 Location  : /home/vbox/vbox-5.0.20/src/VBox/VMM/VMMAll/PGMAllPool.cpp(2574) int pgmPoolMonitorInsert(PPGMPOOL, PPGMPOOLPAGE)
00:20:39.443406 PGMHandlerPhysicalRegisterEx 000000009ed1c000 failed with -1701
00:20:39.443471 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
00:20:39.443472 !!
00:20:39.443473 !!                 Guru Meditation -2701 (VERR_VMM_RING0_ASSERTION)
00:20:39.443492 !!
00:20:39.443543 !!R0-Assertion Failed!!
You can try the following and see if it allows the guest to boot and run:

Code: Select all

VBoxManage setextradata VM_NAME "VBoxInternal/MM/CanUseLargerHeap" 1
If not I would raise a ticket at bugtracker

Re: Linux guest VM becomes "stuck" after a period of time.

Posted: 31. May 2016, 19:11
by Presence
The system doesn't hang now, but in the guest logs I'm getting quite a few "stuck" messages.

Message from syslogd@alienvault at May 31 11:08:05 ...
kernel:[ 5364.060682] BUG: soft lockup - CPU#0 stuck for 22s! [redis-server:3590]

Message from syslogd@alienvault at May 31 11:08:33 ...
kernel:[ 5392.060004] BUG: soft lockup - CPU#0 stuck for 22s! [redis-server:3590]

And some faults related to it:

Code: Select all

[ 5236.060114] BUG: soft lockup - CPU#0 stuck for 22s! [redis-server:3590]
[ 5236.060114] Modules linked in: ip6table_filter ip6_tables ipt_REJECT xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables ppdev i2c_piix4 i2c_core parport_pc serio_raw pcspkr battery parport ac evdev processor ext4 crc16 mbcache jbd2 uvesafb hid_generic usbhid hid sr_mod cdrom sg ohci_pci sd_mod crc_t10dif crct10dif_generic crct10dif_common ata_generic ohci_hcd ehci_pci ehci_hcd psmouse usbcore ahci e1000 usb_common libahci ata_piix video libata thermal_sys scsi_mod button
[ 5236.060114] CPU: 0 PID: 3590 Comm: redis-server Not tainted 3.16.0-4-amd64 #1 Debian 3.16.7-ckt25-1
[ 5236.060114] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 5236.060114] task: ffff88040be8ca60 ti: ffff88040ba50000 task.ti: ffff88040ba50000
[ 5236.060114] RIP: 0010:[<ffffffff8151458f>]  [<ffffffff8151458f>] _raw_spin_lock_bh+0x2f/0x40
[ 5236.060114] RSP: 0018:ffff88040ba53e60  EFLAGS: 00000202
[ 5236.060114] RAX: 0000000000000003 RBX: 0000000000000000 RCX: 0000000000000002
[ 5236.060114] RDX: 0000000000000002 RSI: 00000000fffffe01 RDI: ffff880318418870
[ 5236.060114] RBP: ffff88031e083d00 R08: ffff88040dc9b850 R09: ffff88040f0ee7e0
[ 5236.060114] R10: ffff88030883d498 R11: 0000000000000293 R12: 0000000000000246
[ 5236.060114] R13: ffffffff8118f2cf R14: ffff88040d10b7c0 R15: ffff88040ba53ee0
[ 5236.060114] FS:  00007fa38b818740(0000) GS:ffff88041fc00000(0000) knlGS:0000000000000000
[ 5236.060114] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5236.060114] CR2: 000000001b955f90 CR3: 000000040b8b8000 CR4: 00000000000006f0
[ 5236.060114] Stack:
[ 5236.060114]  ffffffff8140af6a ffff880318418800 ffff88031e083d00 ffff880318418998
[ 5236.060114]  00007fff2a335480 ffffffff8148d6b4 000000000f0ee7e0 ffff88040d10b7c0
[ 5236.060114]  ffff88031e083d00 ffff8800db399700 ffffffff81405cc7 000000008a0fa000
[ 5236.060114] Call Trace:
[ 5236.060114]  [<ffffffff8140af6a>] ? release_sock+0x1a/0x170
[ 5236.060114]  [<ffffffff8148d6b4>] ? inet_accept+0xd4/0x100
[ 5236.060114]  [<ffffffff81405cc7>] ? SYSC_accept4+0xf7/0x200
[ 5236.060114]  [<ffffffff810970a0>] ? wake_up_state+0x10/0x10
[ 5236.060114]  [<ffffffff81514a0d>] ? system_call_fast_compare_end+0x10/0x15
[ 5236.060114] Code: 90 65 81 04 25 60 b8 00 00 00 02 00 00 b8 00 00 01 00 f0 0f c1 07 89 c2 c1 ea 10 66 39 c2 89 d1
From VBox.log:

Code: Select all

00:00:05.472761 TM: Switching TSC mode from 'VirtTscEmulated' to 'RealTscOffset'
00:00:06.368774 GIM: KVM: Enabled wall-clock struct. at 0x0000000001a60ae8 - u32Sec=1464709121 u32Nano=131451956 uVersion=2
00:00:06.369211 PIT: mode=2 count=0x12a5 (4773) - 249.98 Hz (ch=0)
00:00:06.573758 PIT: mode=0 count=0x10000 (65536) - 18.20 Hz (ch=0)
00:00:06.588516 GIM: KVM: VCPU  1: Enabled system-time struct. at 0x000000041ffe5040 - u32TscScale=0xdb6dc07d i8TscShift=-1 uVersion=2 fFlags=0x1 uTsc=0x2bda1a945 uVirtNanoTS=0x12cbb2c13
00:00:06.602560 GIM: KVM: VCPU  2: Enabled system-time struct. at 0x000000041ffe5080 - u32TscScale=0xdb6dc07d i8TscShift=-1 uVersion=2 fFlags=0x1 uTsc=0x2bda1a945 uVirtNanoTS=0x12cbb2c13
00:00:06.616936 GIM: KVM: VCPU  3: Enabled system-time struct. at 0x000000041ffe50c0 - u32TscScale=0xdb6dc07d i8TscShift=-1 uVersion=2 fFlags=0x1 uTsc=0x2bda1a945 uVirtNanoTS=0x12cbb2c13
00:00:08.577030 OHCI: Software reset
00:00:10.039437 PIIX3 ATA: Ctl#0: RESET, DevSel=0 AIOIf=0 CmdIf0=0xa0 (-1 usec ago) CmdIf1=0x00 (-1 usec ago)
00:00:10.039641 PIIX3 ATA: Ctl#0: finished processing RESET
00:00:10.066543 AHCI#0: Reset the HBA
00:00:10.068755 PIIX3 ATA: Ctl#1: RESET, DevSel=0 AIOIf=0 CmdIf0=0x00 (-1 usec ago) CmdIf1=0x00 (-1 usec ago)
00:00:10.068891 PIIX3 ATA: Ctl#1: finished processing RESET
00:00:10.151815 AHCI#0: Port 0 reset
00:00:11.051631 EHCI: Hardware reset
00:00:11.051963 EHCI: USB Operational
00:00:11.083681 OHCI: USB Reset
00:00:11.136435 OHCI: Software reset
00:00:11.137064 OHCI: USB Operational
00:00:11.165987 EHCI: USB Suspended
00:00:15.123894 AIOMgr: Flush failed with VERR_INVALID_PARAMETER, disabling async flushes
00:00:27.262184 OHCI#0: Lagging too far behind, not trying to catch up anymore. Expect glitches with USB devices
00:00:30.068223 OHCI#0: Lagging too far behind, not trying to catch up anymore. Expect glitches with USB devices
00:01:07.690183 OHCI#0: Lagging too far behind, not trying to catch up anymore. Expect glitches with USB devices
00:01:10.873144 OHCI#0: Lagging too far behind, not trying to catch up anymore. Expect glitches with USB devices
00:01:11.306301 OHCI#0: Lagging too far behind, not trying to catch up anymore. Expect glitches with USB devices
00:01:15.957260 OHCI#0: Lagging too far behind, not trying to catch up anymore. Expect glitches with USB devices
00:01:18.258406 OHCI#0: Lagging too far behind, not trying to catch up anymore. Expect glitches with USB devices
00:01:19.695577 OHCI#0: Lagging too far behind, not trying to catch up anymore. Expect glitches with USB devices
00:01:23.881308 OHCI#0: Lagging too far behind, not trying to catch up anymore. Expect glitches with USB devices
00:01:31.264995 OHCI#0: Lagging too far behind, not trying to catch up anymore. Expect glitches with USB devices

Re: Linux guest VM becomes "stuck" after a period of time.

Posted: 31. May 2016, 19:36
by Perryg
Please use the code tags!

Does it function properly if yo set the guest to one processor?

Re: Linux guest VM becomes "stuck" after a period of time.

Posted: 31. May 2016, 20:09
by Presence
Do you suspect that I am oversubscribing the hosts resources, or that there might be some other issue? If you feel that not adding processors to the system is preferrable without significant issue, I'm more than willing to set it up that way.

The strange thing is that when I look at htop, one processor is pegged and the others are mostly idle. This didn't happen before.

Re: Linux guest VM becomes "stuck" after a period of time.

Posted: 31. May 2016, 21:59
by Perryg
I am trying to diagnose the issue so it is a test.

Re: Linux guest VM becomes "stuck" after a period of time.

Posted: 31. May 2016, 22:21
by Presence
43 Minutes up since changing it to single processor, and stuck events so far. System load is at 3.5+ regularly, but no CPU stuck errors.

These events happen continuously:

Code: Select all

00:25:54.721657 TM: Giving up catch-up attempt at a 60 000 300 628 ns lag; new total: 1 020 077 311 488 ns
00:27:14.408958 TM: Giving up catch-up attempt at a 60 007 890 651 ns lag; new total: 1 080 085 202 139 ns
00:28:29.863035 TM: Giving up catch-up attempt at a 60 001 243 246 ns lag; new total: 1 140 086 445 385 ns
00:29:43.148696 TM: Giving up catch-up attempt at a 60 004 343 891 ns lag; new total: 1 200 090 789 276 ns
00:30:58.918931 TM: Giving up catch-up attempt at a 60 007 527 645 ns lag; new total: 1 260 098 316 921 ns
00:39:23.736888 TM: Giving up catch-up attempt at a 60 002 135 293 ns lag; new total: 1 320 100 452 214 ns
00:40:39.178653 TM: Giving up catch-up attempt at a 60 004 496 999 ns lag; new total: 1 380 104 949 213 ns
00:41:54.677860 TM: Giving up catch-up attempt at a 60 001 824 597 ns lag; new total: 1 440 106 773 810 ns
00:43:21.053805 TM: Giving up catch-up attempt at a 60 001 050 565 ns lag; new total: 1 500 107 824 375 ns
00:44:35.558910 TM: Giving up catch-up attempt at a 60 000 411 552 ns lag; new total: 1 560 108 235 927 ns

Re: Linux guest VM becomes "stuck" after a period of time.

Posted: 31. May 2016, 22:27
by Perryg
Ok so try setting the CPU Paravirtualization Interface to none. You should be able to add a processor or two but I never suggest that you over-commit the processor and that includes all running guests as a total. And see what happens.

The "00:25:54.721657 TM: Giving up catch-up attempt at a 60 000 300 628 ns lag; new total: 1 020 077 311 488 ns" usually indicates an over-burdened host/guest.

Re: Linux guest VM becomes "stuck" after a period of time.

Posted: 31. May 2016, 22:52
by Presence
I will try this.

Right now I have 3 of the 8 processor cores committed aside from this host. 2 for alienvault-sensor, and 1 for a low load ubuntu system.

Is there some calculation to determine what you should use for what resources to allocate processor wise? Do you hold back a proc or two for a linux host? Typically Linux needs at least 1, where windows requires 2.

Re: Linux guest VM becomes "stuck" after a period of time.

Posted: 31. May 2016, 22:55
by Perryg
Exactly. Always reserve at least one full core for the host. Also remember processors ( threads ) are not cores so divide them up ( cores ) accordingly.

Re: Linux guest VM becomes "stuck" after a period of time.

Posted: 31. May 2016, 23:01
by Presence
Same issues after turning off virtualization with 2 CPUs.

I have a 2 processor quad core xeon in the box. You are suggesting reserving at least one full core from the eight available, or one full CPU (4 cores)?

Its strange that I have never had issues with virtualbox in the past, but then maybe I just don't have VMs that are this active continuously.

Code: Select all

00:00:14.333205 AIOMgr: Flush failed with VERR_INVALID_PARAMETER, disabling async flushes
00:00:21.750626 OHCI#0: Lagging too far behind, not trying to catch up anymore. Expect glitches with USB devices
00:01:00.095545 OHCI#0: Lagging too far behind, not trying to catch up anymore. Expect glitches with USB devices
00:01:03.532757 OHCI#0: Lagging too far behind, not trying to catch up anymore. Expect glitches with USB devices
00:01:08.382886 OHCI#0: Lagging too far behind, not trying to catch up anymore. Expect glitches with USB devices
00:01:09.410677 OHCI#0: Lagging too far behind, not trying to catch up anymore. Expect glitches with USB devices
00:01:14.471802 OHCI#0: Lagging too far behind, not trying to catch up anymore. Expect glitches with USB devices
00:01:26.086973 OHCI#0: Lagging too far behind, not trying to catch up anymore. Expect glitches with USB devices
00:01:29.829254 OHCI#0: Lagging too far behind, not trying to catch up anymore. Expect glitches with USB devices
00:01:42.314945 OHCI#0: Lagging too far behind, not trying to catch up anymore. Expect glitches with USB devices

Re: Linux guest VM becomes "stuck" after a period of time.

Posted: 31. May 2016, 23:07
by Presence
Also, with paravirtualization off, the system only sees "1" CPU. Does this basically mean that all the vCPUs I'm allocating to the box are being catenated into a "single" cpu from the vm's viewpoint, or that it simply isn't capable of doing more than one vCPU because of the no paravirtualization setting?

Re: Linux guest VM becomes "stuck" after a period of time.

Posted: 31. May 2016, 23:34
by Perryg
Every thing I am asking is for diagnostic purposes. Don't read in anything until we determine what the real issue is, but Cores are what VirtualBox uses not threads. The best you can have with the processors today is 8 full cores in a single die. Another thing is AMD likes to call threads cores but they are not. Hyperthreading allows more pipes per core so 8 becomes 16 but not to VirtualBox. If you have more than one actual CPU then it is multi-processors and that too has an issue of sorts to VirtualBox. It is hard for VirtualBox to span mutiple processors ( unless this has been fixed and I have not been told ) so remember that when assigning cores.

BIG NOTE: I did not say to turn Paravirtualization off, I said to turn the Paravirtualization Interface off. This is under the settings -> system -> acceleration. Turning this off should have no effect on the number of vCPUs. If it does than you have a real issue. Also what is the base for alienvault?