Page 1 of 1

snapshot to a remote iSCSI volume broken since 4.3.0

Posted: 6. Jan 2014, 10:23
by Alban
Hello,

I'm posting to the 'VirtualBox on Windows Hosts' category but the issue I'm experiencing can also be verified on Mac hosts.

I have opened a ticket on the bugtracker, because I did not find anything close enough to this issue in the forum (or old ones).
Please see #12567 (https://www.virtualbox.org/ticket/12567)

Here is a description of the issue:
I used to have workstations running local VirtualBox VMs which base disk are connected to the same remote iSCSI read-only lun. It use to work because each VM also have a default snapshot to the base disk meaning disk writes are redirected to the local differential disk created by the snapshot.
This worked correctly from version 4.0 to latest 4.2.20 and seems broken starting from 4.3.0, including latest 4.3.6
I could verify this on a Mac host (OSX 10.9) and a Windows host (Windows 7 Prof.). I expect it to be the same on a Linux Host but didn't have a chance to verify.

I'm not sure such a scenario is clearly supported ? However I've found other user writing about using snapshots when the vm base disk is provided through a connection to an iSCSI lun.For example: #52210 (viewtopic.php?f=8&t=52210) or related (https://www.virtualbox.org/ticket/11857 and viewtopic.php?f=8&t=52210)
This one (#11479, https://www.virtualbox.org/ticket/11479) is interesting because I had also observed that snapshot over an iSCSI base disk only works when used with an IDE disk controller, not SATA.
In previous tests, I've found that snapshoting over an iSCSI base disk attached to a SATA controller allowed the vm to boot but the differencing disk created by the snapshot never grows which finally crashes the vm.

In VirtualBox 4.3.6 (4.3.0 to 4.3.6 in fact), this doesn't work at all even with an IDE controller. In fact it works 'better' with the SATA controller in a the sense that the vm sometimes boots normally but crashes most of the time. When this happens, many 'AHCI#0: Port 0 reset' can be seen in the VBox.log log file. Also messages about cancelled IO reads.
Enabling or disabling host cache management doesn't change anything.

When booting or installing the guest OS on the same iSCSI lun with write access (no snapshot), everything work fine.

How to reproduce:
machine 1: iSCSI server (I used iet on SLES).
machine 2: VirtualBox host (Windows or OSX using Virtualbox 4.3.6).

1. machine 1:
- create an iSCSI target and a read-write 20GB Lun

2. machine 2:
2.1 create a vm 'win7' without a hard disk
2.2 attach the iSCSI volume from machine 1 with VBoxManage
2.3 install the guest OS (Windows 7 Pro.)
2.4 start the guest OS once after install to make sure booting read-write from the remote target works fine
2.5 poweroff the VM and create a snapshot

3. machine 1:
3.1 make the previous Lun 2 read-only

4. machine 2:
4.1 create a snapshot
4.2 start the VM

- With VirtualBox 4.3.6, the VM hangs just after starting and the associated host process is using 100% CPU time. The diffential disk never grows and there isn't any iSCSI network traffic on machine 1.
Unfortunately, nothing interresting in the VM log file (Windows host or Mac host), except lots of 'AHCI#0: Port 0 reset' when using a SATA controller and disabling host cache.
When using an IDE controller (with or without host cache) or enabling host cache on a SATA controller, nothing in the log; the vm just freezes.

- with VirtualBox 4.2.20, the guest OS starts and differential disk of the snapshot starts to grow while blocks are written

If I downgrade from 4.3.6 to 4.2.20 everything works fine again.
So it's clearly related to something changed starting from 4.3.0
Note, I also tried with immutable disk wich avoid creating a snapshot manually and makes the VM revert to its pristine state at each start (which is the desired behaviour) and the results are the same.

Thanks for you help and ready to provide any additional details as needed.

Happy new year

Alban