Linux guest clock slow, HDD controller gets lost
Posted: 10. Mar 2013, 22:16
We have a number of old demo server installations based on RHEL3-RHEL5 moved into VirtualBoxes running on OpenSolaris (also old, SXCE snv_117). Finally, the VirtualBox software on the host is also old - 3.0.12. All in all, this was a well-performing setup for show-off of our old projects unchanged for several years.
Several months ago the VM guests began to slow down their VritualBox clocks, to the extent that neither every-minute crontabbed "rdate -s clockhost" inside a VM, nor NTP in the VM, nor VBox Guest Additions along with attempts to set up time sync with host (maybe misconfigured attempts though) - neither of these "fixes" alone or combined helps to move the VM clocks at proper pace. Sometimes tens of real minutes pass while one virtual goes along.
Sometimes a VM clock stalls altogether, cycling over the same 2-3 second range over and over, can be stuck so for days.
The only thing to (temporarily) fix the problem is to poweroff and boot the VM. A virtual "reset" without killing the VM process does not help - the clock remains buggy.
The VM logs have hundreds of lines like these:
99:19:42.563 TM: Giving up catch-up attempt at a 60 005 350 962 ns lag; new total: 60 005 350 962 ns
99:21:00.851 TM: Giving up catch-up attempt at a 60 007 957 425 ns lag; new total: 120 013 308 387 ns
...
190:08:56.941 TM: Giving up catch-up attempt at a 60 008 089 399 ns lag; new total: 231 315 151 564 550 ns
These are logged roughly every 75-80 sec (of real time) and cover a 60 second lag (each?)
Usually this kicks in after several hours of VM uptime, though problems can occur even during its startup, or a week can pass without problems.
We've tried to "renice" the VBoxHeadless processes to have a higher-than-usual priority on the host, though now they are depressed into the lowest (19) - because a VM with clock problems consumes a whole CPU core. The problem might be related to the host's ZFS storage becoming pool and slower, at least it seems to occur more often (though not exclusively) when the host is scrubbing its pools regularly on weekends.
Something new began occuring recently: virtual HDDs began to time-out, maybe related to the VM clock and/or host IO lags. Ultimately the Linux guest drops its HDDs and the virtual OS becomes unusable - also a virtual reboot/reset does not help, only a poweroff+poweron allows to find the virtual disks and their controllers.
Any ideas for a fix or understanding of the problem are welcome.
PS: We've recently tried a very different setup, with VirtualBox 4.2.6 on Windows 2008R2 host with VT-X CPU acceleration, running Solaris 10 VMs - these also exhibit regular clock problems requiring regular VM reboots
Several months ago the VM guests began to slow down their VritualBox clocks, to the extent that neither every-minute crontabbed "rdate -s clockhost" inside a VM, nor NTP in the VM, nor VBox Guest Additions along with attempts to set up time sync with host (maybe misconfigured attempts though) - neither of these "fixes" alone or combined helps to move the VM clocks at proper pace. Sometimes tens of real minutes pass while one virtual goes along.
Sometimes a VM clock stalls altogether, cycling over the same 2-3 second range over and over, can be stuck so for days.
The only thing to (temporarily) fix the problem is to poweroff and boot the VM. A virtual "reset" without killing the VM process does not help - the clock remains buggy.
The VM logs have hundreds of lines like these:
99:19:42.563 TM: Giving up catch-up attempt at a 60 005 350 962 ns lag; new total: 60 005 350 962 ns
99:21:00.851 TM: Giving up catch-up attempt at a 60 007 957 425 ns lag; new total: 120 013 308 387 ns
...
190:08:56.941 TM: Giving up catch-up attempt at a 60 008 089 399 ns lag; new total: 231 315 151 564 550 ns
These are logged roughly every 75-80 sec (of real time) and cover a 60 second lag (each?)
Usually this kicks in after several hours of VM uptime, though problems can occur even during its startup, or a week can pass without problems.
We've tried to "renice" the VBoxHeadless processes to have a higher-than-usual priority on the host, though now they are depressed into the lowest (19) - because a VM with clock problems consumes a whole CPU core. The problem might be related to the host's ZFS storage becoming pool and slower, at least it seems to occur more often (though not exclusively) when the host is scrubbing its pools regularly on weekends.
Something new began occuring recently: virtual HDDs began to time-out, maybe related to the VM clock and/or host IO lags. Ultimately the Linux guest drops its HDDs and the virtual OS becomes unusable - also a virtual reboot/reset does not help, only a poweroff+poweron allows to find the virtual disks and their controllers.
Any ideas for a fix or understanding of the problem are welcome.
PS: We've recently tried a very different setup, with VirtualBox 4.2.6 on Windows 2008R2 host with VT-X CPU acceleration, running Solaris 10 VMs - these also exhibit regular clock problems requiring regular VM reboots