Frequent VDI Corruption - 6.1.34 on Debian 10 Host

kdhallbgm · Post by **kdhallbgm** » 28. Apr 2022, 22:13

Hello,

I've been a VirtualBox user for many years, but recently I'm trying to do something a bit larger scale (but still non-production).

In the past couple weeks I've started having VDI files randomly become inaccessible with VERR_VD_VDI_INVALID_HEADER. This is after normal shutdowns of Debian 10 VMs running on a Debian 10 host with VirtualBox 6.1.32 and 6.1.34. I have been working for several days to try to isolate how I might be causing this, but I am unable to find a pattern. I have also attempted to search various VirtualBox resources to see if this is a known or recent bug, but I haven't found anything.

I run VirtualBox on several systems, so I'm starting to wonder about differences in the underlying storage. It may be that this started when I moved my 'VirtualBox VMs' directory to an NVMe U.2 drive formatted with XFS. Again, I haven't found any warnings again XFS on NVMe or against storing VirtualBox VMs on this configuration.

I will next plan to try storing my VMs on a system with a spinning RAID array, either via NFS or SSHFS.

I also saw references to an issue with TM/VirtualSync/CurrentOffset - I've seen some large values in my log files. Could this be related in any way?

I would also be interested to know if there is a Linux-based (free) tool or technique for repairing this when it happens. If breakage will be frequent, being able to repair would be handy.

Any advice on this would be greatly appreciated. At this point I'm nearly at a standstill - I can't proceed when my VMs are randomly needing to be rebuilt.

Thanks.

-Dave

Post by **fth0** » 28. Apr 2022, 22:44

kdhallbgm wrote:VDI files randomly become inaccessible with VERR_VD_VDI_INVALID_HEADER

VirtualBox uses this error code for several possible detected errors. Please provide one or more (zipped) VBox.log files from VM runs with this error, so that we can check if the damaged part of the header is always the same or not.

kdhallbgm wrote:I also saw references to an issue with TM/VirtualSync/CurrentOffset - I've seen some large values in my log files. Could this be related in any way?

Please provide a (zipped) VBox.log file from such a VM run, and I'll have a look.

kdhallbgm · Post by **kdhallbgm** » 29. Apr 2022, 13:35

fth0,

Please see the attached. The .2 log was from the last poweroff. The .1 is from when it wouldn't start back up.

Thanks for having a look.

Post by **fth0** » 29. Apr 2022, 14:44

Regarding the 1st log file and the VDI error, I didn't see any of the expected log messages. How do you execute VBoxHeadless, and what do you get when trying it from within a terminal? Alternatively, provide the 1st sector of the VDI file(s) (as a file or hexdump).

Regarding the 2nd log file and lost time, the VM ran normally for the first ~10 minutes, and continuously lost half of its time during the >4 days of runtime, while being idle most of the time. I wonder if it has to do with the obsolete VirtualBox Guest Additions (GA) 5.2.0 being installed in the guest OS. Please update the GA and check if this issue persists.

Post by **mpack** » 29. Apr 2022, 16:05

It's odd to see a VM whose RAM allocation (100GB) is several times larger than its disk drive (16GB).

kdhallbgm · Post by **kdhallbgm** » 29. Apr 2022, 16:20

OK, this is embarrasing - Guest Addtions? All these years and I missed that. Oddly, I've had a lot of long running VMs that don't have these.

Also, though, the thing about the Extension Pack licensing sort of threw me off. If I had seen the Additions I might have mistaken them for the Extension Pack and not installed them.

Summary: I'll install the Additions and let you know how it goes. Sorry for such a noob error.

kdhallbgm · Post by **kdhallbgm** » 29. Apr 2022, 23:48

Hello,

So I've learned all about Guest Additions - I now have it running on all of the VMs I have running on this host system. However, I'm still getting corrupted VDIs. The easy way to do this:

System config: 8 cores, 16GB RAM, 80GB HDD (dynamically allocated)
Install Debian 10 (from firmware.iso) - only the base system and SSH
Install basic stuff like sudo, rsync, net-tools, plus linux-headers-amd64 and build-essential. Also ntp and ntpdate
--> ntp.conf configured to my local time servers
Install the Guest Additions
Check that VBoxService is running.
Reboot/power-cycle a couple times.

Then modify /etc/apt/sources.list to point to Debian 11 (bullseye), and apt-get update, apt-get upgrade, apt-get dist-upgrade

Reboot and verify that everything seems to be running, including VBoxService

Reboot a couple more times, and then power off.

And that's it. The VDI will be messed up and the VM won't power on.

------

There's a little more: I did a fresh install - same virtual hardware and basic install as above - but this time I mounted a spinning RAID array from another system via SSHFS, and installed this VM there. On this VM, after I got everything installed and VBoxService running, I checked 'ntpq -c pe' and saw the offset from my NTP servers increase over a couple minutes to nearly +90 seconds. This seems really odd.

Analysis: the host system was sync'd to different NTP servers - offsite, but more accurate, and also the same ones that my internal NTP servers are sync'd to. I shifted the host to be the same as the subject VM, and checked all of my various NTP servers to be sure they were in sync with their upstreams. I stopped NTP and set clocks with ntpdate, then restarted NTP. Still, the VM's clocks were ahead by quite a bit - over a minute.

Finally, I rebooted the VM and it seemed to settle in.

BUT - I just checked again after writing all of this, and the offsets on 'ntpq -c pe' are back up again - two around 45s and two around 90s.

I'd have to believe that this could be the root cause of my VDI corruption for some reason. Even if not, this is really weird and not very handy.

Thoughts?

-Dave

Post by **mpack** » 30. Apr 2022, 10:13

We have lots of people who regularly run without Guest Additions, e.g. users with VMs for Win9x, MacOS and DOS. Also lots of people running headless VMs feel they don't need the GAs. None of them get corrupted VDIs as a result.

I don't know of any connection between the Guest Additions and the disk emulation.

If I had files on the host being corrupted I would look for an explanation on the host. One thing that makes VDI files stand out is their large size and the fact that they are often being written to. I have to say that I don't think I've seen a lot of XFS (filesystem) users on Linux in the past, so until persuaded otherwise I would have concerns that the relevant code is not well proven in use. I freely admit that I take very little interest in Linux, so it's possible that there's a vast army of XFS users who have hidden themselves from me until now.

Post by **fth0** » 30. Apr 2022, 10:46

kdhallbgm wrote:VM's clocks were ahead

Do you really mean ahead and not behind? I'm asking because I've sometimes seen setups where the guest OS time is lagging (e.g. host time 08:15:00, guest time 08:14:30), but never the other way around. And your VBox.log.2 file also indicated that.

If you like to watch the behavior of the internal clocks live, open a terminal and execute watch -d -n 1 VBoxManage debugvm "VM name" info clocks.

kdhallbgm · Post by **kdhallbgm** » 1. May 2022, 17:40

Clocks are head. I hadn't checked since Friday evening. But right now the output on the guest is:

Code: Select all

# ntpq -c pe
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*ntp1.devs.cs 128.226.118.33      3 u  165  512    3    0.319  +472.28   6.089
+ntp2.devs.c 128.226.118.33       3 u  226  512    1    0.306  +472.93   5.558
+ntp3.devs.cs.in 128.226.118.33   3 u  167  512    3    0.278  +472.10   6.000
 ntp4.devs.cs.i 128.226.118.33    3 u  104  512    3    0.292  +487.35   6.183

However, the date/time on both the host and the guest are accurate.

The VBoxManage output for this VM is:

Code: Select all

# VBoxManage debugvm "osa-proto-11-deploy" info clocks
Cpu Tick:    427511041329050 (0x0184d1abc1879a) 2800001270Hz ticking - virtualized - real tsc offset
          offset 1364951353377598
Cpu Tick:    427511041354810 (0x0184d1abc1ec3a) 2800001270Hz ticking - virtualized - real tsc offset
          offset 1364951353377598
Cpu Tick:    427511041364526 (0x0184d1abc2122e) 2800001270Hz ticking - virtualized - real tsc offset
          offset 1364951353377598
Cpu Tick:    427511041372338 (0x0184d1abc230b2) 2800001270Hz ticking - virtualized - real tsc offset
          offset 1364951353377598
Cpu Tick:    427511041380122 (0x0184d1abc24f1a) 2800001270Hz ticking - virtualized - real tsc offset
          offset 1364951353377598
Cpu Tick:    427511041387290 (0x0184d1abc26b1a) 2800001270Hz ticking - virtualized - real tsc offset
          offset 1364951353377598
Cpu Tick:    427511041394290 (0x0184d1abc28672) 2800001270Hz ticking - virtualized - real tsc offset
          offset 1364951353377598
Cpu Tick:    427511041401290 (0x0184d1abc2a1ca) 2800001270Hz ticking - virtualized - real tsc offset
          offset 1364951353377598
 Virtual:    152682450632494 (0x008add273bbf2e) 1000000000Hz ticking
VirtSync:    101364339327479 (0x005c30b97041f7) paused - catchup
          offset 51318100354825  catch-up rate 300 %
    Real:          640022295 (0x0000002625f717) 1000Hz

virtualbox.org

Frequent VDI Corruption - 6.1.34 on Debian 10 Host

Frequent VDI Corruption - 6.1.34 on Debian 10 Host

Re: Frequent VDI Corruption - 6.1.34 on Debian 10 Host

Re: Frequent VDI Corruption - 6.1.34 on Debian 10 Host

Re: Frequent VDI Corruption - 6.1.34 on Debian 10 Host

Re: Frequent VDI Corruption - 6.1.34 on Debian 10 Host

Re: Frequent VDI Corruption - 6.1.34 on Debian 10 Host

Re: Frequent VDI Corruption - 6.1.34 on Debian 10 Host

Re: Frequent VDI Corruption - 6.1.34 on Debian 10 Host

Re: Frequent VDI Corruption - 6.1.34 on Debian 10 Host

Re: Frequent VDI Corruption - 6.1.34 on Debian 10 Host