Page 1 of 1

Windows 2012 R2 Guest Hangs and/or change state to stuck

Posted: 4. Dec 2015, 15:10
by Yvor77
I have a Windows Server 2012 R2 guest running on Debian 8.0 host. The guest system has been virtualised from physical. The host is a Dell Dimension T5400 2x Intel Xeon 5420 QC CPU, 16 GB RAM and a 2x1.0 TB hard drive configured as software RAID 1. I do understand, the storage configuration is not ideal, but I do not believe it is the cause of the problem. The Linux server does not have GUI, guest is running headless mode. On the host system, I have disabled power saving option of all kind and set the CPU governor to performance, therefore the CPU does not use Intel speedstep.
I have removed all of the hardware specific drivers before converting the physical machine to virtual. After conversion, I have installed Virtualbox Guest addition. The guest system has the bare minimum hardware added, using 4 CPU cores, 8GB RAM, PAE/NX enabled, nested paging disabled, SAS storage controller with 2 disk, SATA controller with optical drive (currently empty), USB 2.0 host controller (no attached device), 128MB Video RAM, VRDP enabled.

Problem is that the guest system freezes inside Virtualbox, however, when connecting to VRDP, I can see the last screen on it so the guest OS itself had not crashed. Not always, but some times, the phpVirtualbox dashboard says the VM is STUCK. Most of the time though, it sees the guest os running, although in the log file it clearly states that the guest seems to be unresponsive for no heartbeat signal has been received for 4 seconds. It happens after running anywhere between 10 to 36 hours. After resetting or powering of the guest and restarting, it runs like brand new for few hours.

I have attached a couple of copies of the log files as well as a screen shot of the guest configuration page.

I have tried to change virtualization and power saving settings in the BIOS, as well as on the host OS. I have disabled all power saving options on the guest. There are no related entries in the guest OS's event log at the time of the freeze. It seems it only 'just' stopped.

I have been using Virtualbox for years on different hosts with different guest systems, never really have come across problems that I could not fix by myself after a little bit of searching. I have been doing lots of searching related to this issue and tried the suggested solutions, but none of them has been successful.

I am really desperate to find a solution or some guide how to troubleshoot the issue, as the log files are not very meaningful (at least to me not).

Thank you guys for your effort of looking into this problem in advance. Any question raised, I will try to provide the ansewer ASAP.

Re: Windows 2012 R2 Guest Hangs and/or change state to stuck

Posted: 4. Dec 2015, 16:26
by mpack
What's going on in the host when this lockup happens?

I see a lot of these near the end of the log, just before you tried resetting the connection :-
VBox.log wrote: 33:52:00.812731 TM: Giving up catch-up attempt at a 60 001 752 294 ns lag; new total: 60 122 687 179 ns
33:59:10.062777 TM: Giving up catch-up attempt at a 60 000 040 440 ns lag; new total: 120 122 727 619 ns
34:07:55.422776 TM: Giving up catch-up attempt at a 60 000 624 222 ns lag; new total: 180 123 351 841 ns
34:17:28.252807 TM: Giving up catch-up attempt at a 60 001 906 423 ns lag; new total: 240 125 258 264 ns
34:26:40.912798 TM: Giving up catch-up attempt at a 60 004 739 644 ns lag; new total: 300 129 997 908 ns
34:35:18.992733 TM: Giving up catch-up attempt at a 60 000 815 173 ns lag; new total: 360 130 813 081 ns
34:44:57.572747 TM: Giving up catch-up attempt at a 60 001 884 301 ns lag; new total: 420 132 697 382 ns
34:54:31.152684 TM: Giving up catch-up attempt at a 60 001 812 492 ns lag; new total: 480 134 509 874 ns
35:03:09.062656 TM: Giving up catch-up attempt at a 60 003 720 921 ns lag; new total: 540 138 230 795 ns
35:11:32.672645 TM: Giving up catch-up attempt at a 60 000 618 754 ns lag; new total: 600 138 849 549 ns
35:19:54.582763 TM: Giving up catch-up attempt at a 60 003 866 572 ns lag; new total: 660 142 716 121 ns
35:28:23.382783 TM: Giving up catch-up attempt at a 60 003 134 485 ns lag; new total: 720 145 850 606 ns
35:36:21.552668 TM: Giving up catch-up attempt at a 60 007 781 261 ns lag; new total: 780 153 631 867 ns
35:45:03.552665 TM: Giving up catch-up attempt at a 60 000 245 027 ns lag; new total: 840 153 876 894 ns
35:53:43.002772 TM: Giving up catch-up attempt at a 60 002 599 662 ns lag; new total: 900 156 476 556 ns
36:01:16.022653 TM: Giving up catch-up attempt at a 60 004 253 023 ns lag; new total: 960 160 729 579 ns
36:12:31.242763 TM: Giving up catch-up attempt at a 60 001 364 164 ns lag; new total: 1 020 162 093 743 ns
36:22:11.462639 TM: Giving up catch-up attempt at a 60 001 125 541 ns lag; new total: 1 080 163 219 284 ns
36:30:41.002706 TM: Giving up catch-up attempt at a 60 000 782 524 ns lag; new total: 1 140 164 001 808 ns
36:39:48.792673 TM: Giving up catch-up attempt at a 60 001 127 189 ns lag; new total: 1 200 165 128 997 ns
36:47:17.402635 TM: Giving up catch-up attempt at a 60 000 586 607 ns lag; new total: 1 260 165 715 604 ns
36:56:22.232746 TM: Giving up catch-up attempt at a 60 001 980 723 ns lag; new total: 1 320 167 696 327 ns
37:06:10.812736 TM: Giving up catch-up attempt at a 60 001 865 242 ns lag; new total: 1 380 169 561 569 ns
37:14:58.568644 TM: Giving up catch-up attempt at a 60 000 049 693 ns lag; new total: 1 440 169 611 262 ns
37:22:26.022763 TM: Giving up catch-up attempt at a 60 006 857 594 ns lag; new total: 1 500 176 468 856 ns
37:31:50.132727 TM: Giving up catch-up attempt at a 60 005 448 673 ns lag; new total: 1 560 181 917 529 ns
37:39:34.812686 TM: Giving up catch-up attempt at a 60 003 218 563 ns lag; new total: 1 620 185 136 092 ns
I don't have an easy way to find this in the source, but I'm going to assume that "TM" refers to Time Management or similar. I.e. it is struggling to keep the VM ticker aligned with real time, which I'm guessing means that something was going on in the host.

Re: Windows 2012 R2 Guest Hangs and/or change state to stuck

Posted: 7. Dec 2015, 11:39
by Yvor77
Dear mpack,

Thank you for your looking into this problem.

As for your question, I traced the events back on the guest operating system. The only apparent thing running at the time of hang was Microsoft Azure AD Connect synchronisation. But the thing is, it runs in every three hours and no changes to AD has really been made, so I can hardly imagine that this process could make the guest system sweat.

The server also runs Spiceworks with a very small setup. The network scan can be quite intensive, but just like with the other service, this one runs more than once a day. The only other function running the box is a Windows Deployment Server service, which does not cause too much load and it never being used when the hangs happened, around midnight.

As a starting point of troubleshooting, I will disable Spiceworks and post back whether it stayed on or not. If it hangs again, I will temporarily disable AD synchronisation.

I am still a little bit clueless what to do if one or the other service causes the hangs.

I will post back, when I have new information.

Thank you for your time and effort.

Re: Windows 2012 R2 Guest Hangs and/or change state to stuck

Posted: 7. Dec 2015, 15:42
by mpack
Yvor77 wrote:The only apparent thing running at the time of hang was Microsoft Azure AD Connect synchronisation. But the thing is, it runs in every three hours and no changes to AD has really been made, so I can hardly imagine that this process could make the guest system sweat.
I referred to something consuming CPU or I/O time on the host, not the guest.