[SOLVED] Windows 7 guest hangs Linux host I/O

Discussions about using Windows guests in VirtualBox.
Post Reply
Seann
Posts: 5
Joined: 28. Jan 2015, 19:41

[SOLVED] Windows 7 guest hangs Linux host I/O

Post by Seann »

I have been having a problem with a 64-bit Windows 7 guest that is hanging I/O on the Linux host OS. Everything was running fine until January 1, 2015 after I ran a 'yum update'. (at least I think this was the triggering event ;)

Here's a synopsis of the problem. Windows 7 guest boots up. Sometimes it works fine for a few minutes, sometimes it'll run for an hour or two, then the load average on the Linux box starts increasing. I've let it go to 250 before power cycling. When the load average starts increasing the sync command [sync() system call according to strace] hangs.
  • Host specs:
    Fedora Core 21
    Linux proxy 3.18.3-201.fc21.x86_64 #1 SMP Mon Jan 19 15:59:31 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
    Intel(R) Core(TM) i7-3820 CPU @ 3.60GHz
    32GB RAM
    VirtualBox 4.3.20 r96996
    Seagate SSHD 2TB storage (ST2000DX001) setup in RAID-1 configuration (/dev/md)
    XFS file system
32-bit Windows XP guest runs fine with 1 CPU an 2 GB memory
64-bit Windows 7 guest hangs guest I/O after a random period of time with 2 CPU and 4 GB memory

I've tried various combinations of VirtualBox (chip set) settings for Windows 7, but can't find a stable combination. This was based on numerous reports of Windows 7 guest instability on this forum. I used VBoxManage clonehd to move from a dynamically allocated .vdi file to a fixed size. I've degragmented the XFS file system using xfs_fsr.

Yesterday, I tried an experiment and got some very interesting results. I moved my .vdi file from local storage to another server running ext4 and NFS mounted the .vdi file from the remote server to the VirtualBox host. I am able to boot into 64-bit Windows 7 guest and it works just fine. However, after some period of time, I/O on the host Linux box hangs and the Windows 7 guest continues to function! If I have a root shell open to the host OS from guest, I can still run some commands for troubleshooting. Networking and NFS continue to function and I can cleanly shutdown Windows 7 before power cycling the host box.

This latest finding of host hanging while guest continues to work led to this forum post, and will probably lead to a bug track request. Does anyone have any ideas as to what may be going on? I'm suspecting there may be some kernel interactions with 4.3.20 and have tried kernels 3.17.4, 3.17.6, 3.17.7, 3.17.8 and 3.18.3. I was running stable on 3.17.7-300fc21 for two weeks before I ran another yum update and broke everything again.

I'm looking for troubleshooting tips where I can track the root cause of this problem down. How do I find out where I/O is getting hung in the kernel?

Oh, I've also tried running 'smartctl --test=long' on the hard disks and they came back clean. When I copy my 128GB .vdi file from one location to another, it takes about an hour and I/O is very slow on the host during this operation.

I think that covers everything, if you have any questions feel free to ask.

Regards,
-Seann
Last edited by Seann on 2. Feb 2015, 19:18, edited 1 time in total.
Seann
Posts: 5
Joined: 28. Jan 2015, 19:41

Re: Windows 7 guest hangs Linux host I/O

Post by Seann »

Here's an update on the problem. Replacement hard drive finished syncing and I tested Windows 7 again this morning still hosting .vdi file on NFS share. Windows 7 ran for about 30 minutes before hanging I/O on the host Linux box. I was however able to save the evidence of the system state when it hung yesterday (hung I/Os are in red).

sda is the drive that had hung I/Os and was replaced.
md125 is mirrored device sda1+sdb1 (encrypted home directories)
md126 is mirrored device sda2+sdb2 (root file system)
md127 is mirrored device sda3+sdb3 (encrypted swap)
dm-0 is encrypted md127 device for swap
dm-1 is encrypted md125 device for home directories

During this morning's test, I/O was hung on sda again. sda is the good drive that was previously the sdb drive in the output below. So the hung I/Os stayed with sda.

Now we have evidence of where I/Os are being held, but I don't know where to go from here.

Regards,
-Seann

> cat /proc/diskstats
8 0 sda 779764 4694 115835794 1773838 1506231 66275 44109977 39837460 16 3852720 45534188
8 1 sda1 420174 2036 66928530 1029116 523473 5948 30601498 24710783 3 2015390 26617823
8 2 sda2 359117 2556 48900437 740968 964814 58996 13495610 15077182 13 2307775 18863020
8 3 sda3 379 102 5195 2982 766 1331 12869 11301 0 12936 14281
11 0 sr0 0 0 0 0 0 0 0 0 0 0 0
8 16 sdb 91229 1500 7066775 481080 1506561 65974 44110049 38562571 0 2295689 39043781
8 17 sdb1 47690 589 3076345 165660 523610 5813 30601514 25274472 0 853723 25440604
8 18 sdb2 43321 905 3987959 313339 964996 58840 13495666 13241896 0 1484406 13554903
8 19 sdb3 124 6 839 1862 777 1321 12869 8874 0 10585 10736
9 127 md127 406047 0 52887529 0 767543 0 13057728 0 0 0 0
9 126 md126 381 0 4670 0 1538 0 12304 0 0 0 0
253 0 dm-0 338 0 3840 2161 1538 0 12304 139950 0 12041 142111
9 125 md125 470353 0 70004031 0 415793 0 29949064 0 0 0 0
253 1 dm-1 470311 0 70003209 1291594 405674 0 29949192 106545278 13 1997587 109996941
Seann
Posts: 5
Joined: 28. Jan 2015, 19:41

Re: Windows 7 guest hangs Linux host I/O

Post by Seann »

I finally got to the bottom of my VirtualBox issues, or at least I know the source of the hangs now. ;) The /proc/diskstats output is misleading because it shows there is no I/O on the sr0 device:

11 0 sr0 0 0 0 0 0 0 0 0 0 0 0

However, using a kernel debugger on the live kernel, there are hung I/Os on sr0 as well as sda:
crash> dev -d
MAJOR GENDISK            NAME       REQUEST_QUEUE      TOTAL ASYNC  SYNC   DRV
    8 ffff880810350c00   sda        ffff8808103b0000      23     0    23     0
   11 ffff88081038e400   sr0        ffff8808103b0788       2     0     2     0
    8 ffff8808102bbc00   sdb        ffff88080fc18000       0     0     0     0
    9 ffff88080ffd4000   md127      ffff88080e180000       0     0     0     0
    9 ffff88080ffd5c00   md126      ffff88080e180788       0     0     0     0
  253 ffff88080fe25400   dm-0       ffff88080e180f10       0     0     0     0
    9 ffff88080d125800   md125      ffff8808103b0f10       0     0     0     0
  253 ffff88080ffe7c00   dm-1       ffff88080fc18788       0     0     0     0
There was no disc in the DVD drive so I unplugged it and I've been running Windows 7 64-bit without issues over the weekend. I plugged it back in this morning and disconnected VirtualBox's pointer to sr0 in the Devices->CD/DVD Devices menu.

Regards,
-Seann
Post Reply