[Fixed] VBoxSVC backend lockup on linked clone with floppy

Postings relating to old VirtualBox pre-releases
Post Reply
Technologov
Volunteer
Posts: 3342
Joined: 10. May 2007, 16:59
Location: Israel

[Fixed] VBoxSVC backend lockup on linked clone with floppy

Post by Technologov »

Host: Win 7 x64 + BETA2.

When doing linked clone of a Windows XP VM (with floppy), VBoxSVC process locked up.

UPDATE: This happens, *despite* the floppy image in question is a leftover of unattended install, and it is NOT attached to the actual cloned VM's latest snapshot.
Only first snapshot had Floppy attached. Latest snapshot did not.
This is a VBoxSVC bug, that is triggered / revealed by my vbox-unattended patch.

Directory structure analysis:
C:\VM-VBox\Windows group\Windows XP\
Windows XP.vbox
Windows XP.vdi
floppy_script.img
Logs\
Snapshots\
...

UPDATE2: Basically, what triggers this lock-up, is if the floppy lives *inside* the VM folder.
If Floppy image lives outside, then everything is fine.
Plus, the same bug occur, if there is a CD ISO image in a VM's folder. (normally it never happens),

UPDATE3: A similar bug also occurs on Full-Clone, with exception that GUI doesn't lock-up, but I get an error message like INVALID ARG.
Attachments
Windows XP.vbox.txt
VM config file, before cloning
(29.16 KiB) Downloaded 29 times
VBoxSVC.log
(6.71 KiB) Downloaded 36 times
klaus
Oracle Corporation
Posts: 1101
Joined: 10. May 2007, 14:57

Re: VBoxSVC backend lockup on linked clone with floppy

Post by klaus »

Thanks for the very detailed report and investigation. This is a long-standing bug actually, and the hang is caused by extremely hard to explain circumstances. The result is that at the end of cloning, a totally unexpected code path is triggered which results in a bogus error message in VBoxSVC.log and a tiny bit later a deadlock. Need to test a bit more to be certain, but I think I have a fix. The presence of DVD images etc. has been tested, but no one tried having them in the VM directory. Without this the code would do too much (in a harmless way), but wouldn't hang.
Technologov
Volunteer
Posts: 3342
Joined: 10. May 2007, 16:59
Location: Israel

Re: VBoxSVC backend lockup on linked clone with floppy

Post by Technologov »

Thanks for your hard work, Klaus. If you have a patch (or build), I can test it also (tomorrow). I debugged VBoxSVC cloning code for 6 hours, but no success. Very long spaghetti code, with 1 function taking 24 kilobytes of code. Hard to debug this monster. I need to test both Full and linked clones.
klaus
Oracle Corporation
Posts: 1101
Joined: 10. May 2007, 14:57

Re: VBoxSVC backend lockup on linked clone with floppy

Post by klaus »

The code is VERY long, but it's well-structured and reasonably documented. That said, I would be very surprised if you could work out from the error message or hang what the actual cause is. Leftovers from the source VM's config, which need to be wiped (they'll get automatically recreated when saving the target VM config). Hope the diff below doesn't get mangled, but even if it does, making the change manually is trivial. Only one occurrence of llHardDisks in the file.

Code: Select all

Index: MachineImplCloneVM.cpp
===================================================================
--- MachineImplCloneVM.cpp	(revision 101188)
+++ MachineImplCloneVM.cpp	(working copy)
@@ -995,6 +995,8 @@
 
         /* Reset media registry. */
         trgMCF.mediaRegistry.llHardDisks.clear();
+        trgMCF.mediaRegistry.llDvdImages.clear();
+        trgMCF.mediaRegistry.llFloppyImages.clear();
         /* If we got a valid snapshot id, replace the hardware/storage section
          * with the stuff from the snapshot. */
         settings::Snapshot sn;
Technologov
Volunteer
Posts: 3342
Joined: 10. May 2007, 16:59
Location: Israel

Re: VBoxSVC backend lockup on linked clone with floppy

Post by Technologov »

Yeah, a lot better now !
With inserted floppy it is semi-working now; With ejected floppy, it works properly now.

Test cases:
Test #1: Floppy inserted/registered - Full clone
Before patch = FAILED.
Result Code:
NS_ERROR_INVALID_ARG (0x80070057)
After patch = SEMI-WORKING. Clone points to the original VM floppy images, while should copy image(s) sitting in VM folder, and give new UUID.
Test #2: Floppy inserted/registered - Linked clone
Before patch = FAILED, stucked forever
After patch = SEMI-WORKING (correctly?). Since VirtualBox lacks differencing floppy and ISO images, we should copy all images sitting in the parent VM folder, and give new UUID, just like in the Full-clone case #1.

Test #3: Floppy ejected/unregistered - Full clone
Before patch = FAILED,
Result Code:
NS_ERROR_INVALID_ARG (0x80070057)
After patch = WORKS
Test #4: Floppy ejected/unregistered - Linked clone
Before patch = FAILED, stucked forever
After patch = WORKS

SEMI-WORKING state causes another behavior (as expected): (starting 2 cloned VMs)
ERROR:
"Locking of attached media failed. A possible reason is that one of the media is attached to a running VM."
Result Code:
VBOX_E_INVALID_OBJECT_STATE (0x80BB0007)
Component:
SessionMachine
Interface:
IMachine {feb138aa-dbce-4a89-8ec0-380fc7ec4913}
... This happens because 2 cloned VMs trying to access one floppy image (from parent), in read-write mode. Doesn't matter if linked clone or full clone.

That said, this patch is a definite improvement.
klaus
Oracle Corporation
Posts: 1101
Joined: 10. May 2007, 14:57

Re: VBoxSVC backend lockup on linked clone with floppy

Post by klaus »

The behavior for media with mode writethrough or readonly (the latter is just a variant of the former with different locking behavior) is currently "keep them", i.e. your findings confirm that the code is behaving as designed. This is treated as analog to "take snapshot", where the same medium is used. One of the reasons was that until 5.0 the API couldn't clone floppy or DVD images, which is now easily possible. The question is: what do users expect? I know what you expect, but writethrough stuff is somewhat special (and medium term we'll need to be able to have diff images for floppies).
Technologov
Volunteer
Posts: 3342
Joined: 10. May 2007, 16:59
Location: Israel

Re: VBoxSVC backend lockup on linked clone with floppy

Post by Technologov »

IMO: Several cloned VMs should be able to boot up. Especially, if the floppy image is part of the VM folder (which means it is a private image, not a generic image).
frank
Oracle Corporation
Posts: 3362
Joined: 7. Jun 2007, 09:11
Primary OS: Debian Sid
VBox Version: PUEL
Guest OSses: Linux, Windows
Location: Dresden, Germany
Contact:

Re: VBoxSVC backend lockup on linked clone with floppy

Post by frank »

I thikn there was a fix in RC2. Could you confirm?
klaus
Oracle Corporation
Posts: 1101
Joined: 10. May 2007, 14:57

Re: VBoxSVC backend lockup on linked clone with floppy

Post by klaus »

The VirtualBox API doesn't have any concept of private vs. generic images. It doesn't assign any meaning to storage locations.

Currently the code behaves as designed - for linked clones images with mode=writethrough are "shared", i.e. refer to the same images (because writethrough images are not subject to diff image creation). For full clones such images are copied. The behavior isn't set in stone until the end of the world... it would need a behavior change with more explanation than "Several cloned VMs should be able to boot up". How would you propose to handle writethrough images in the full clone and linked clone case? The current definition makes sense, and we need a new definition which makes more sense :)
Technologov
Volunteer
Posts: 3342
Joined: 10. May 2007, 16:59
Location: Israel

Re: VBoxSVC backend lockup on linked clone with floppy

Post by Technologov »

I'm not yet sure about write-through images. I'm thinking more about normal and read-only floppy images.

My general solution (to all kinds of floppy images), is to copy image if the source sits inside /VM/ folder. (Esp. for full-clone VMs.) i.e. make CloneVM API source location dependent.

But it doesn't need to happen for v5.0 (as it will mostly hit vbox-unattended users later, this can be fixed later, with better design)
klaus
Oracle Corporation
Posts: 1101
Joined: 10. May 2007, 14:57

Re: VBoxSVC backend lockup on linked clone with floppy

Post by klaus »

Floppy images are by default writethrough, and can be changed to readonly. All writethrough images (hard disks and floppies) are handled the same, and also all readonly images (floppies and CD/DVD) are handled the same. To say it early: the power of the API mostly comes from predictable, consistent behavior, and I don't like the idea to make the storage location change the API behavior. At all. There could be a new clone option which changes some details about the cloning behavior.
Technologov
Volunteer
Posts: 3342
Joined: 10. May 2007, 16:59
Location: Israel

Re: VBoxSVC backend lockup on linked clone with floppy

Post by Technologov »

Okay. Then not. Maybe later we will have a better idea. Why write-through floppies ? why not normal ?

Frank:
I think there was a fix in RC2. Could you confirm?
Yes, it is there.
Last edited by Technologov on 29. Jun 2015, 20:07, edited 1 time in total.
klaus
Oracle Corporation
Posts: 1101
Joined: 10. May 2007, 14:57

Re: VBoxSVC backend lockup on linked clone with floppy

Post by klaus »

I think the most accurate answer is "historical reasons". Floppy and DVD drives and used to be very specially handled (which is now long gone, unifying everything under IStorageController, IMediumAttachment and IMedium - keeping them at the status quo, which matched "writethrough"), and no one saw a pressing need to support diff images for them (i.e. handling them as normal images during snapshotting etc.). It's a good question actually if it's worth creating a diff image at all, because these days (with the big data alignment values) an empty .vdi image is already 2MB in size (and uses probably 2-8K of it). Might be cheaper to copy the entire image, but that needs additional code (not only treating them like regular hard disk images).
Post Reply