VMDK file corruption after teleport

jeffcourteau · Post by **jeffcourteau** » 21. Apr 2017, 05:04

Hello guys,

Tonight I teleported my most important server, my main database server, thinking it was routine operation. The teleport seemed to go well as it did probably the last 100 times I teleported a VM. I even had a running ping -t running to see how many pings I would lose on the teleport. Just one lost ping.

The VM ran approximately for a minute, and then crashed. The UI was saying "Aborted" under my VM name. It is a Windows 2012 Server standard running in VirtualBox 5.18 on CentOS. On one side, the VMDK is directly attached through the filesystem. On the other side, it is attached through an SMB mount. The VMDK was created using VMWare Converter from a physical machine.

Lucky me, I have a backup that dates back to yesterday, so I restored it.

Now my question is: Are VMDK files problematic when it comes to teleporting VMs? Should I simply avoid running production VMs using VMDK files? Should I just convert them to VDI or start from scratch from a new VM?

Thanks in advance for your constructive advice!

J-F Courteau

jeffcourteau · Post by **jeffcourteau** » 24. Apr 2017, 22:00

OK well, either I did not give enough detail, or nobody is using the Teleport feature, or no one encountered such a situation.

Anyways, I decided to eliminate any VMDK I may have and converted any disk I have to a fixed size (I did a vboxmanage clonehd --type VDI --variant Fixed [src] [dst]).

Now simply put: Has anyone, anywhere, encountered any corruption on their VDI file after a teleport? Is the "Teleport" function "production ready"?

Post by **Perryg** » 24. Apr 2017, 22:38

Don't want you to think you are not important so I decided to answer even if I have no clue. Very few use teleport these days as far as I can tell and I only used it once to test years ago. It is highly possible there could be a regression bug and I would suggest you post a topic at bugtracker to see what the DEVs have to say. Be prepared to submit exact information and diagnostic information such as log files and details of the corruption though.

madworm · Post by **madworm** » 30. Aug 2017, 15:27

I tested teleporting today. Host to host and locally. Both times the disk of the target VM got corrupted.

Host: W10
Guest: W10, linux
Disk image type: vmdk
Version 5.1.26 r117224 (Qt5.6.2)

The W10 guest seemed to work initially, but crashed a few seconds afterwards & wouldn't boot again. The linux guest ran for longer & complained about missing files after reboot.

madworm · Post by **madworm** » 30. Aug 2017, 15:35

It would be nice if someone tried this as well, before I file a bug report.

c47402ae58 · Post by **c47402ae58** » 13. Apr 2018, 20:08

It would seem VDI doesn't have this problem. I've since converted my VM's to VDI using VBoxManage clonemedium and I'm not seeing corruption anymore. I teleport about a dozen VM's to one side of an NFS cluster every month to patch the underlying system, and then teleport them back again to patch the other cluster node.

I've also discovered a reliable method for VMDK recovery.

1) Set up a VM with Windows on it. Install VM-Ware workstation to get the vmware-vdiskmanager utility.
2) Use a Shared Folder in the VM to get to the broken VMDK. Map it on e:
3) Try to fix the broken vmdk with

Code: Select all

vmware-vdiskmanager -R e:\myhappy.vmdk

If that fails, you have another option if you have backups of the vmdk.

4) Restore the first meg or so of the broken vmdk somewhere else. You don't need the whole thing because we're going to steal the first part.
5) On the host (Linux) OS,

Code: Select all

dd if=myrestored.vmdk of=myhappy.vmdk count=4 conv=notrunc,nocreat

6) Retry step 3.

This has fixed most of the issues I've had with broken VMDK's.

Also, the 250ms maxdowntime suggested in the manual is nowhere near enough to get the VM to come over if it's busy. If I set the --maxdowntime parameter to around 5000 then it struggles, but comes over. My test for this is to get a Linux VM running "memtester 2048" to thrash the memory and then try the teleport. I'm guessing the amount of time it takes to come over has to do with how quickly it can copy the last bit of memory over the network after pausing the VM. I'm using well-tuned 10Gb ethernet, so that 2GB of memory that's thrashing comes over in about 2 seconds.

Mario

virtualbox.org

VMDK file corruption after teleport

VMDK file corruption after teleport

Re: VMDK file corruption after teleport

Re: VMDK file corruption after teleport

Re: VMDK file corruption after teleport

Re: VMDK file corruption after teleport

Re: VMDK file corruption after teleport