Frequent VDI Corruption - 6.1.34 on Debian 10 Host
Frequent VDI Corruption - 6.1.34 on Debian 10 Host
Hello,
I've been a VirtualBox user for many years, but recently I'm trying to do something a bit larger scale (but still non-production).
In the past couple weeks I've started having VDI files randomly become inaccessible with VERR_VD_VDI_INVALID_HEADER. This is after normal shutdowns of Debian 10 VMs running on a Debian 10 host with VirtualBox 6.1.32 and 6.1.34. I have been working for several days to try to isolate how I might be causing this, but I am unable to find a pattern. I have also attempted to search various VirtualBox resources to see if this is a known or recent bug, but I haven't found anything.
I run VirtualBox on several systems, so I'm starting to wonder about differences in the underlying storage. It may be that this started when I moved my 'VirtualBox VMs' directory to an NVMe U.2 drive formatted with XFS. Again, I haven't found any warnings again XFS on NVMe or against storing VirtualBox VMs on this configuration.
I will next plan to try storing my VMs on a system with a spinning RAID array, either via NFS or SSHFS.
I also saw references to an issue with TM/VirtualSync/CurrentOffset - I've seen some large values in my log files. Could this be related in any way?
I would also be interested to know if there is a Linux-based (free) tool or technique for repairing this when it happens. If breakage will be frequent, being able to repair would be handy.
Any advice on this would be greatly appreciated. At this point I'm nearly at a standstill - I can't proceed when my VMs are randomly needing to be rebuilt.
Thanks.
-Dave
I've been a VirtualBox user for many years, but recently I'm trying to do something a bit larger scale (but still non-production).
In the past couple weeks I've started having VDI files randomly become inaccessible with VERR_VD_VDI_INVALID_HEADER. This is after normal shutdowns of Debian 10 VMs running on a Debian 10 host with VirtualBox 6.1.32 and 6.1.34. I have been working for several days to try to isolate how I might be causing this, but I am unable to find a pattern. I have also attempted to search various VirtualBox resources to see if this is a known or recent bug, but I haven't found anything.
I run VirtualBox on several systems, so I'm starting to wonder about differences in the underlying storage. It may be that this started when I moved my 'VirtualBox VMs' directory to an NVMe U.2 drive formatted with XFS. Again, I haven't found any warnings again XFS on NVMe or against storing VirtualBox VMs on this configuration.
I will next plan to try storing my VMs on a system with a spinning RAID array, either via NFS or SSHFS.
I also saw references to an issue with TM/VirtualSync/CurrentOffset - I've seen some large values in my log files. Could this be related in any way?
I would also be interested to know if there is a Linux-based (free) tool or technique for repairing this when it happens. If breakage will be frequent, being able to repair would be handy.
Any advice on this would be greatly appreciated. At this point I'm nearly at a standstill - I can't proceed when my VMs are randomly needing to be rebuilt.
Thanks.
-Dave
-
- Volunteer
- Posts: 5677
- Joined: 14. Feb 2019, 03:06
- Primary OS: Mac OS X other
- VBox Version: PUEL
- Guest OSses: Linux, Windows 10, ...
- Location: Germany
Re: Frequent VDI Corruption - 6.1.34 on Debian 10 Host
VirtualBox uses this error code for several possible detected errors. Please provide one or more (zipped) VBox.log files from VM runs with this error, so that we can check if the damaged part of the header is always the same or not.kdhallbgm wrote:VDI files randomly become inaccessible with VERR_VD_VDI_INVALID_HEADER
Please provide a (zipped) VBox.log file from such a VM run, and I'll have a look.kdhallbgm wrote:I also saw references to an issue with TM/VirtualSync/CurrentOffset - I've seen some large values in my log files. Could this be related in any way?
Re: Frequent VDI Corruption - 6.1.34 on Debian 10 Host
fth0,
Please see the attached. The .2 log was from the last poweroff. The .1 is from when it wouldn't start back up.
Thanks for having a look.
Please see the attached. The .2 log was from the last poweroff. The .1 is from when it wouldn't start back up.
Thanks for having a look.
- Attachments
-
- VBox.log.1.gz
- Failed to start
- (594 Bytes) Downloaded 5 times
-
- VBox.log.2.gz
- Last working log up to normal poweroff
- (123.54 KiB) Downloaded 7 times
-
- Volunteer
- Posts: 5677
- Joined: 14. Feb 2019, 03:06
- Primary OS: Mac OS X other
- VBox Version: PUEL
- Guest OSses: Linux, Windows 10, ...
- Location: Germany
Re: Frequent VDI Corruption - 6.1.34 on Debian 10 Host
Regarding the 1st log file and the VDI error, I didn't see any of the expected log messages. How do you execute VBoxHeadless, and what do you get when trying it from within a terminal? Alternatively, provide the 1st sector of the VDI file(s) (as a file or hexdump).
Regarding the 2nd log file and lost time, the VM ran normally for the first ~10 minutes, and continuously lost half of its time during the >4 days of runtime, while being idle most of the time. I wonder if it has to do with the obsolete VirtualBox Guest Additions (GA) 5.2.0 being installed in the guest OS. Please update the GA and check if this issue persists.
Regarding the 2nd log file and lost time, the VM ran normally for the first ~10 minutes, and continuously lost half of its time during the >4 days of runtime, while being idle most of the time. I wonder if it has to do with the obsolete VirtualBox Guest Additions (GA) 5.2.0 being installed in the guest OS. Please update the GA and check if this issue persists.
-
- Site Moderator
- Posts: 39134
- Joined: 4. Sep 2008, 17:09
- Primary OS: MS Windows 10
- VBox Version: PUEL
- Guest OSses: Mostly XP
Re: Frequent VDI Corruption - 6.1.34 on Debian 10 Host
It's odd to see a VM whose RAM allocation (100GB) is several times larger than its disk drive (16GB).
Re: Frequent VDI Corruption - 6.1.34 on Debian 10 Host
OK, this is embarrasing - Guest Addtions? All these years and I missed that. Oddly, I've had a lot of long running VMs that don't have these.
Also, though, the thing about the Extension Pack licensing sort of threw me off. If I had seen the Additions I might have mistaken them for the Extension Pack and not installed them.
Summary: I'll install the Additions and let you know how it goes. Sorry for such a noob error.
Also, though, the thing about the Extension Pack licensing sort of threw me off. If I had seen the Additions I might have mistaken them for the Extension Pack and not installed them.
Summary: I'll install the Additions and let you know how it goes. Sorry for such a noob error.
Re: Frequent VDI Corruption - 6.1.34 on Debian 10 Host
Hello,
So I've learned all about Guest Additions - I now have it running on all of the VMs I have running on this host system. However, I'm still getting corrupted VDIs. The easy way to do this:
------
There's a little more: I did a fresh install - same virtual hardware and basic install as above - but this time I mounted a spinning RAID array from another system via SSHFS, and installed this VM there. On this VM, after I got everything installed and VBoxService running, I checked 'ntpq -c pe' and saw the offset from my NTP servers increase over a couple minutes to nearly +90 seconds. This seems really odd.
Analysis: the host system was sync'd to different NTP servers - offsite, but more accurate, and also the same ones that my internal NTP servers are sync'd to. I shifted the host to be the same as the subject VM, and checked all of my various NTP servers to be sure they were in sync with their upstreams. I stopped NTP and set clocks with ntpdate, then restarted NTP. Still, the VM's clocks were ahead by quite a bit - over a minute.
Finally, I rebooted the VM and it seemed to settle in.
BUT - I just checked again after writing all of this, and the offsets on 'ntpq -c pe' are back up again - two around 45s and two around 90s.
I'd have to believe that this could be the root cause of my VDI corruption for some reason. Even if not, this is really weird and not very handy.
Thoughts?
-Dave
So I've learned all about Guest Additions - I now have it running on all of the VMs I have running on this host system. However, I'm still getting corrupted VDIs. The easy way to do this:
And that's it. The VDI will be messed up and the VM won't power on.System config: 8 cores, 16GB RAM, 80GB HDD (dynamically allocated)
Install Debian 10 (from firmware.iso) - only the base system and SSH
Install basic stuff like sudo, rsync, net-tools, plus linux-headers-amd64 and build-essential. Also ntp and ntpdate
--> ntp.conf configured to my local time servers
Install the Guest Additions
Check that VBoxService is running.
Reboot/power-cycle a couple times.
Then modify /etc/apt/sources.list to point to Debian 11 (bullseye), and apt-get update, apt-get upgrade, apt-get dist-upgrade
Reboot and verify that everything seems to be running, including VBoxService
Reboot a couple more times, and then power off.
------
There's a little more: I did a fresh install - same virtual hardware and basic install as above - but this time I mounted a spinning RAID array from another system via SSHFS, and installed this VM there. On this VM, after I got everything installed and VBoxService running, I checked 'ntpq -c pe' and saw the offset from my NTP servers increase over a couple minutes to nearly +90 seconds. This seems really odd.
Analysis: the host system was sync'd to different NTP servers - offsite, but more accurate, and also the same ones that my internal NTP servers are sync'd to. I shifted the host to be the same as the subject VM, and checked all of my various NTP servers to be sure they were in sync with their upstreams. I stopped NTP and set clocks with ntpdate, then restarted NTP. Still, the VM's clocks were ahead by quite a bit - over a minute.
Finally, I rebooted the VM and it seemed to settle in.
BUT - I just checked again after writing all of this, and the offsets on 'ntpq -c pe' are back up again - two around 45s and two around 90s.
I'd have to believe that this could be the root cause of my VDI corruption for some reason. Even if not, this is really weird and not very handy.
Thoughts?
-Dave
-
- Site Moderator
- Posts: 39134
- Joined: 4. Sep 2008, 17:09
- Primary OS: MS Windows 10
- VBox Version: PUEL
- Guest OSses: Mostly XP
Re: Frequent VDI Corruption - 6.1.34 on Debian 10 Host
We have lots of people who regularly run without Guest Additions, e.g. users with VMs for Win9x, MacOS and DOS. Also lots of people running headless VMs feel they don't need the GAs. None of them get corrupted VDIs as a result.
I don't know of any connection between the Guest Additions and the disk emulation.
If I had files on the host being corrupted I would look for an explanation on the host. One thing that makes VDI files stand out is their large size and the fact that they are often being written to. I have to say that I don't think I've seen a lot of XFS (filesystem) users on Linux in the past, so until persuaded otherwise I would have concerns that the relevant code is not well proven in use. I freely admit that I take very little interest in Linux, so it's possible that there's a vast army of XFS users who have hidden themselves from me until now.
I don't know of any connection between the Guest Additions and the disk emulation.
If I had files on the host being corrupted I would look for an explanation on the host. One thing that makes VDI files stand out is their large size and the fact that they are often being written to. I have to say that I don't think I've seen a lot of XFS (filesystem) users on Linux in the past, so until persuaded otherwise I would have concerns that the relevant code is not well proven in use. I freely admit that I take very little interest in Linux, so it's possible that there's a vast army of XFS users who have hidden themselves from me until now.
-
- Volunteer
- Posts: 5677
- Joined: 14. Feb 2019, 03:06
- Primary OS: Mac OS X other
- VBox Version: PUEL
- Guest OSses: Linux, Windows 10, ...
- Location: Germany
Re: Frequent VDI Corruption - 6.1.34 on Debian 10 Host
Do you really mean ahead and not behind? I'm asking because I've sometimes seen setups where the guest OS time is lagging (e.g. host time 08:15:00, guest time 08:14:30), but never the other way around. And your VBox.log.2 file also indicated that.kdhallbgm wrote:VM's clocks were ahead
If you like to watch the behavior of the internal clocks live, open a terminal and execute watch -d -n 1 VBoxManage debugvm "VM name" info clocks.
Re: Frequent VDI Corruption - 6.1.34 on Debian 10 Host
Clocks are head. I hadn't checked since Friday evening. But right now the output on the guest is:
However, the date/time on both the host and the guest are accurate.
The VBoxManage output for this VM is:
Code: Select all
# ntpq -c pe
remote refid st t when poll reach delay offset jitter
==============================================================================
*ntp1.devs.cs 128.226.118.33 3 u 165 512 3 0.319 +472.28 6.089
+ntp2.devs.c 128.226.118.33 3 u 226 512 1 0.306 +472.93 5.558
+ntp3.devs.cs.in 128.226.118.33 3 u 167 512 3 0.278 +472.10 6.000
ntp4.devs.cs.i 128.226.118.33 3 u 104 512 3 0.292 +487.35 6.183
The VBoxManage output for this VM is:
Code: Select all
# VBoxManage debugvm "osa-proto-11-deploy" info clocks
Cpu Tick: 427511041329050 (0x0184d1abc1879a) 2800001270Hz ticking - virtualized - real tsc offset
offset 1364951353377598
Cpu Tick: 427511041354810 (0x0184d1abc1ec3a) 2800001270Hz ticking - virtualized - real tsc offset
offset 1364951353377598
Cpu Tick: 427511041364526 (0x0184d1abc2122e) 2800001270Hz ticking - virtualized - real tsc offset
offset 1364951353377598
Cpu Tick: 427511041372338 (0x0184d1abc230b2) 2800001270Hz ticking - virtualized - real tsc offset
offset 1364951353377598
Cpu Tick: 427511041380122 (0x0184d1abc24f1a) 2800001270Hz ticking - virtualized - real tsc offset
offset 1364951353377598
Cpu Tick: 427511041387290 (0x0184d1abc26b1a) 2800001270Hz ticking - virtualized - real tsc offset
offset 1364951353377598
Cpu Tick: 427511041394290 (0x0184d1abc28672) 2800001270Hz ticking - virtualized - real tsc offset
offset 1364951353377598
Cpu Tick: 427511041401290 (0x0184d1abc2a1ca) 2800001270Hz ticking - virtualized - real tsc offset
offset 1364951353377598
Virtual: 152682450632494 (0x008add273bbf2e) 1000000000Hz ticking
VirtSync: 101364339327479 (0x005c30b97041f7) paused - catchup
offset 51318100354825 catch-up rate 300 %
Real: 640022295 (0x0000002625f717) 1000Hz