Page 1 of 1

[Fixed] VBoxSVC backend lockup on linked clone with floppy

Posted: 23. Apr 2015, 17:34
by Technologov
Host: Win 7 x64 + BETA2.

When doing linked clone of a Windows XP VM (with floppy), VBoxSVC process locked up.

UPDATE: This happens, *despite* the floppy image in question is a leftover of unattended install, and it is NOT attached to the actual cloned VM's latest snapshot.
Only first snapshot had Floppy attached. Latest snapshot did not.
This is a VBoxSVC bug, that is triggered / revealed by my vbox-unattended patch.

Directory structure analysis:
C:\VM-VBox\Windows group\Windows XP\
Windows XP.vbox
Windows XP.vdi
floppy_script.img
Logs\
Snapshots\
...

UPDATE2: Basically, what triggers this lock-up, is if the floppy lives *inside* the VM folder.
If Floppy image lives outside, then everything is fine.
Plus, the same bug occur, if there is a CD ISO image in a VM's folder. (normally it never happens),

UPDATE3: A similar bug also occurs on Full-Clone, with exception that GUI doesn't lock-up, but I get an error message like INVALID ARG.

Re: VBoxSVC backend lockup on linked clone with floppy

Posted: 19. Jun 2015, 19:57
by klaus
Thanks for the very detailed report and investigation. This is a long-standing bug actually, and the hang is caused by extremely hard to explain circumstances. The result is that at the end of cloning, a totally unexpected code path is triggered which results in a bogus error message in VBoxSVC.log and a tiny bit later a deadlock. Need to test a bit more to be certain, but I think I have a fix. The presence of DVD images etc. has been tested, but no one tried having them in the VM directory. Without this the code would do too much (in a harmless way), but wouldn't hang.

Re: VBoxSVC backend lockup on linked clone with floppy

Posted: 19. Jun 2015, 20:34
by Technologov
Thanks for your hard work, Klaus. If you have a patch (or build), I can test it also (tomorrow). I debugged VBoxSVC cloning code for 6 hours, but no success. Very long spaghetti code, with 1 function taking 24 kilobytes of code. Hard to debug this monster. I need to test both Full and linked clones.

Re: VBoxSVC backend lockup on linked clone with floppy

Posted: 19. Jun 2015, 20:46
by klaus
The code is VERY long, but it's well-structured and reasonably documented. That said, I would be very surprised if you could work out from the error message or hang what the actual cause is. Leftovers from the source VM's config, which need to be wiped (they'll get automatically recreated when saving the target VM config). Hope the diff below doesn't get mangled, but even if it does, making the change manually is trivial. Only one occurrence of llHardDisks in the file.

Code: Select all

Index: MachineImplCloneVM.cpp
===================================================================
--- MachineImplCloneVM.cpp	(revision 101188)
+++ MachineImplCloneVM.cpp	(working copy)
@@ -995,6 +995,8 @@
 
         /* Reset media registry. */
         trgMCF.mediaRegistry.llHardDisks.clear();
+        trgMCF.mediaRegistry.llDvdImages.clear();
+        trgMCF.mediaRegistry.llFloppyImages.clear();
         /* If we got a valid snapshot id, replace the hardware/storage section
          * with the stuff from the snapshot. */
         settings::Snapshot sn;

Re: VBoxSVC backend lockup on linked clone with floppy

Posted: 20. Jun 2015, 18:48
by Technologov
Yeah, a lot better now !
With inserted floppy it is semi-working now; With ejected floppy, it works properly now.

Test cases:
Test #1: Floppy inserted/registered - Full clone
Before patch = FAILED.
Result Code:
NS_ERROR_INVALID_ARG (0x80070057)
After patch = SEMI-WORKING. Clone points to the original VM floppy images, while should copy image(s) sitting in VM folder, and give new UUID.
Test #2: Floppy inserted/registered - Linked clone
Before patch = FAILED, stucked forever
After patch = SEMI-WORKING (correctly?). Since VirtualBox lacks differencing floppy and ISO images, we should copy all images sitting in the parent VM folder, and give new UUID, just like in the Full-clone case #1.

Test #3: Floppy ejected/unregistered - Full clone
Before patch = FAILED,
Result Code:
NS_ERROR_INVALID_ARG (0x80070057)
After patch = WORKS
Test #4: Floppy ejected/unregistered - Linked clone
Before patch = FAILED, stucked forever
After patch = WORKS

SEMI-WORKING state causes another behavior (as expected): (starting 2 cloned VMs)
ERROR:
"Locking of attached media failed. A possible reason is that one of the media is attached to a running VM."
Result Code:
VBOX_E_INVALID_OBJECT_STATE (0x80BB0007)
Component:
SessionMachine
Interface:
IMachine {feb138aa-dbce-4a89-8ec0-380fc7ec4913}
... This happens because 2 cloned VMs trying to access one floppy image (from parent), in read-write mode. Doesn't matter if linked clone or full clone.

That said, this patch is a definite improvement.

Re: VBoxSVC backend lockup on linked clone with floppy

Posted: 22. Jun 2015, 13:15
by klaus
The behavior for media with mode writethrough or readonly (the latter is just a variant of the former with different locking behavior) is currently "keep them", i.e. your findings confirm that the code is behaving as designed. This is treated as analog to "take snapshot", where the same medium is used. One of the reasons was that until 5.0 the API couldn't clone floppy or DVD images, which is now easily possible. The question is: what do users expect? I know what you expect, but writethrough stuff is somewhat special (and medium term we'll need to be able to have diff images for floppies).

Re: VBoxSVC backend lockup on linked clone with floppy

Posted: 22. Jun 2015, 19:09
by Technologov
IMO: Several cloned VMs should be able to boot up. Especially, if the floppy image is part of the VM folder (which means it is a private image, not a generic image).

Re: VBoxSVC backend lockup on linked clone with floppy

Posted: 26. Jun 2015, 10:59
by frank
I thikn there was a fix in RC2. Could you confirm?

Re: VBoxSVC backend lockup on linked clone with floppy

Posted: 29. Jun 2015, 18:00
by klaus
The VirtualBox API doesn't have any concept of private vs. generic images. It doesn't assign any meaning to storage locations.

Currently the code behaves as designed - for linked clones images with mode=writethrough are "shared", i.e. refer to the same images (because writethrough images are not subject to diff image creation). For full clones such images are copied. The behavior isn't set in stone until the end of the world... it would need a behavior change with more explanation than "Several cloned VMs should be able to boot up". How would you propose to handle writethrough images in the full clone and linked clone case? The current definition makes sense, and we need a new definition which makes more sense :)

Re: VBoxSVC backend lockup on linked clone with floppy

Posted: 29. Jun 2015, 18:15
by Technologov
I'm not yet sure about write-through images. I'm thinking more about normal and read-only floppy images.

My general solution (to all kinds of floppy images), is to copy image if the source sits inside /VM/ folder. (Esp. for full-clone VMs.) i.e. make CloneVM API source location dependent.

But it doesn't need to happen for v5.0 (as it will mostly hit vbox-unattended users later, this can be fixed later, with better design)

Re: VBoxSVC backend lockup on linked clone with floppy

Posted: 29. Jun 2015, 18:26
by klaus
Floppy images are by default writethrough, and can be changed to readonly. All writethrough images (hard disks and floppies) are handled the same, and also all readonly images (floppies and CD/DVD) are handled the same. To say it early: the power of the API mostly comes from predictable, consistent behavior, and I don't like the idea to make the storage location change the API behavior. At all. There could be a new clone option which changes some details about the cloning behavior.

Re: VBoxSVC backend lockup on linked clone with floppy

Posted: 29. Jun 2015, 18:29
by Technologov
Okay. Then not. Maybe later we will have a better idea. Why write-through floppies ? why not normal ?

Frank:
I think there was a fix in RC2. Could you confirm?
Yes, it is there.

Re: VBoxSVC backend lockup on linked clone with floppy

Posted: 29. Jun 2015, 19:16
by klaus
I think the most accurate answer is "historical reasons". Floppy and DVD drives and used to be very specially handled (which is now long gone, unifying everything under IStorageController, IMediumAttachment and IMedium - keeping them at the status quo, which matched "writethrough"), and no one saw a pressing need to support diff images for them (i.e. handling them as normal images during snapshotting etc.). It's a good question actually if it's worth creating a diff image at all, because these days (with the big data alignment values) an empty .vdi image is already 2MB in size (and uses probably 2-8K of it). Might be cheaper to copy the entire image, but that needs additional code (not only treating them like regular hard disk images).