Sol-11-Exp panic, virtualbox hang, AHCI faults

varboxer · Post by **varboxer** » 29. Mar 2011, 15:07

Hello

I was running virtualbox 3.2.10 on Win7x64 host, Solaris-11-Express guest with two extra raw physical hard disks mounted as a zfs mirror.

All was well for some time until one day I started a scrub and came back to find: i) a panic in Solaris, and ii) virtualbox would not close.

After that Solaris was not bootable! It would always panic (ahci:ahci_watchdog_handler?) and cause virtualbox to become faulty (unable to close due to VERR_VM_INVALID_VM_STATE).

I moved the two physical SATA disks from a virtual SAS connection to a virtual SATA connection with no luck.

I was subsequently able to boot Solaris without the two disks attached though.

I then upgraded virtualbox to 4.0.4 and was able to boot with the two disks attached to a virtual SATA connection, however I got a great many errors from Solaris and virtualbox (SYNCHRONIZE CACHE command failed).

It seems there's some interaction between a Solaris 11 Express guest and VirtualBox which is causing errors in both (I'm almost certain a Solaris 10 guest never had these problems but now I've upgraded my zpool version I can't revert to Solaris 10

). Now I don't trust the attachment of the two disks and dare not use the pool...

Here's the details:

---------

text on screen when solaris panics:

fsb: 0
es: 4b
trp: e
cs: 30
ss: 0
gsb: fffffffffbc304e0 ds:4b
err: 2
rfl: 10286
rip: fffffffff7ae48d8
rsp: ffffff00028f5930

unix:die+dd
unix:trap+1799
unix:cmntrap+e6
ahci:ahci_add_doneq+18
ahci:ahci_mop_commands+aca
ahci:ahci_timeout_pkts+2d2
ahci:ahci_watchdog_handler+2d5
genunix:callout_list_expire+77
genunix:callout_expire+31
genunix:callout_execute+1e
[genunix:taskq_thread+248] seen first time not second
unix:thread_start+8
seen first time:
panic[cpu0]/thread=ffffff00028f5c40: panic dump timeout
dump aborted: please record the above information!

---------------

VBox.log :

40:09:53.637 AIOMgr: I/O manager 0x000000072027e0 encountered a critical error (rc=VERR_UNRESOLVED_ERROR) during operation. Falling back to failsafe mode. Expect reduced performance
40:09:53.638 AIOMgr: Error happened in D:\tinderbox\win-3.2\src\VBox\VMM\PDMAsyncCompletionFileNormal.cpp:(1654){pdmacFileAioMgrNormal}
40:09:53.638 AIOMgr: Please contact the product vendor
40:14:38.486 OHCI: USB Reset
40:14:38.486 OHCI: Software reset
40:14:38.486 EHCI: USB Suspended
40:14:38.497 EHCI: Hardware reset
40:14:38.507 EHCI: Hardware reset
40:14:38.507 Changing the VM state from 'RUNNING' to 'RESETTING'.
40:14:38.508 GMM: Statistics:
40:14:38.508 Allocated pages: 5006d
40:14:38.508 Maximum pages: 5009c
40:14:38.508 Ballooned pages: 0
40:14:38.528 CPUMSetGuestCpuIdFeature: Enabled APIC
40:14:38.528 CPUMSetGuestCpuIdFeature: Disabled x2APIC
40:14:38.528 PIT: mode=3 count=0x10000 (65536) - 18.20 Hz (ch=0)
40:14:38.530 Audio: set_record_source ars=0 als=0 (not implemented)
40:14:38.530 PIIX3 ATA: Ctl#1: finished processing RESET
40:14:38.530 PIIX3 ATA: Ctl#0: finished processing RESET
133:10:53.796 VMR3Reset:
133:10:53.796 RUNNING -> RESETTING, SUSPENDED -> RESETTING, RUNNING_LS -> RESETTING_LS failed, because the VM state is actually RESETTING
133:10:53.797 VMSetError: D:\tinderbox\win-3.2\src\VBox\VMM\VM.cpp(3144) vmR3TrySetState; rc=VERR_VM_INVALID_VM_STATE
133:10:53.805 VMSetError: VMR3Reset failed because the current VM state, RESETTING, was not found in the state transition table
133:10:53.805 ERROR [COM]: aRC=VBOX_E_VM_ERROR (0x80bb0003) aIID={6375231a-c17c-464b-92cb-ae9e128d71c3} aComponent={Console} aText={Could not reset the machine (VERR_VM_INVALID_VM_STATE)} aWarning=false, preserve=false

--------------

Solaris errors:

mpt_get_sas_device_page0 header: IOCStatus=0x22, IOCLogInfo=0x0
Disconnected command timeout
mpt_flush_target discovered non-NULL command in slot XX tasktype 3

--------------------

Solaris errors when a guest inside v4.0.4 :
WARNING: /pci@0,0/pci8086,2829@d/disk@0,0 (sd0):
SYNCHRONIZE CACHE command failed (5)

That is repeated MANY times.

---------------

VBox.log in v4.0.4 :

00:01:07.864 AIO/win: Request 0x000000071f9710 returned rc=VERR_UNRESOLVED_ERROR (native 1117
00:01:07.864 )AioMgr0-N: Request 0x0000001428a920 failed with rc=VERR_UNRESOLVED_ERROR, migrating endpoint \\.\PhysicalDrive0 to failsafe manager.
00:01:07.864 AIO/win: Request 0x000000071f9690 returned rc=VERR_UNRESOLVED_ERROR (native 1117
00:01:07.864 )AsyncCompletion: Task 0x0000000745e0c0 completed after 15 seconds
00:01:07.944 AsyncCompletion: Task 0x00000007198280 completed after 19 seconds
00:01:24.972 AHCI#1: Read at offset 206338335744 (131072 bytes left) returned rc=VERR_UNRESOLVED_ERROR
00:01:24.972 AsyncCompletion: Task 0x00000007198100 completed after 36 seconds
...
00:01:25.784 AHCI#1: Flush returned rc=VERR_INVALID_FUNCTION
00:01:25.836 AHCI#1: Flush returned rc=VERR_INVALID_FUNCTION
00:01:25.888 AHCI#1: Flush returned rc=VERR_INVALID_FUNCTION

That message repeated MANY times

-----------------