[issue] ZFS on Windows through VirtualBox
Posted: 31. Mar 2014, 20:14
This is a little project that I've been working on to use ZFS through a Windows OS
I'll list the specifications and then the issue I'm experiencing, following with my reason for doing this. Any input or ideas would be grateful.
Specifications:
+ i7 970
+ OCZ Reaper HPC DDR3 PC3-12800 (ocz3rpr1600lv2g) 3x2GB
+ Corsair Vengeance (cmz12gx3m3a1600c9) 3x4GB
+ Gigabyte ex58-ud5
+ HX850W PSU
+ GTX780
+ 2x Intel SSD (RAID0)
+ 6x Toshiba 3TB HHD
+ Windows 7 x64 Ultimate
+ Avira AV
+ VirtualBox 4.3.10 r93012 (upgraded from previous version last night)
VM:
+ FreeBSD x64 10.0 RELEASE
+ 4 GB RAM
+ OS on separate VDI
+ 6x 3TB HDD mounted, 2x SSD mounted
The function used to create the raw vmkds:
Issue:
Reading from the pool is introducing checksum errors and is corrupting data in a striped mirror pool.
I copied over roughly 3 TB of data into the pool with no issues. However, once I wanted to copy out a few of my datasets (e.g. moon/home/user) I got two CHECKSUM (CKSUM) errors while doing this. One was a zip file, the other a jpeg. I opened the copied image file and indeed 80% of the image was intact with the remainder of the data scrambled.
At this point I had 2 data errors and less than 10 checksum errors for the entire pool.
I decide to do a scrub with zpool scrub moon. The scrub finished with around 40 checksum errors with ~1.68M repaired. I was expecting the scrub to fix the two files, but after the scrub there are now 43 data errors. Output shown below is the filesystem running a second scrub of the pool.
Almost halfway through the scrub there are now 80 checksum errors in the pool.
The number of data errors dropped down to 16 from 43 when I did the following:
+ Offline all HDD and SSD in Windows
+ Booting VM resulted in a PAUSED state
+ Bring all HDD and SSD Online
+ Booted VM
Strange
Edit 04/02/2014:
I looked up out ZFS handles cache flushing, and how VirtualBox handles cache flushing.
According to http://docs.oracle.com/cd/E26505_01/htm ... zfs-6.html
According to http://www.virtualbox.org/manual/ch12.html
12.2.2. Responding to guest IDE/SATA flush requests
I'm going to enable cache flushing and see how that affects results
Edit 04/11/2014
Kernel Panic. The VM crashed, and I'm adding this as informational.
The reason I'm doing this:
Prevention from bit rot, and a cooler storage of data with compression and deduplication. This is my everything-station; I use it as a server and a workstation and I like to find a balance of pushing the most utilization out of it in a single contained tower case that also looks nice. I also have no room for server racks, so this is the most compact solution.
Thanks
I'll list the specifications and then the issue I'm experiencing, following with my reason for doing this. Any input or ideas would be grateful.
Specifications:
+ i7 970
+ OCZ Reaper HPC DDR3 PC3-12800 (ocz3rpr1600lv2g) 3x2GB
+ Corsair Vengeance (cmz12gx3m3a1600c9) 3x4GB
+ Gigabyte ex58-ud5
+ HX850W PSU
+ GTX780
+ 2x Intel SSD (RAID0)
+ 6x Toshiba 3TB HHD
+ Windows 7 x64 Ultimate
+ Avira AV
+ VirtualBox 4.3.10 r93012 (upgraded from previous version last night)
VM:
+ FreeBSD x64 10.0 RELEASE
+ 4 GB RAM
+ OS on separate VDI
+ 6x 3TB HDD mounted, 2x SSD mounted
Code: Select all
<?xml version="1.0"?>
<!--
** DO NOT EDIT THIS FILE.
** If you make changes to this file while any VirtualBox related application
** is running, your changes will be overwritten later, without taking effect.
** Use VBoxManage or the VirtualBox Manager GUI to make changes.
-->
<VirtualBox xmlns="http://www.innotek.de/VirtualBox-settings" version="1.14-windows">
<Machine uuid="{f77661a4-2cf9-4d40-a4d2-770bc4e23148}" name="ZFS" OSType="FreeBSD_64" snapshotFolder="Snapshots" lastStateChange="2014-04-01T18:09:02Z">
<MediaRegistry>
<HardDisks>
<HardDisk uuid="{5b2b1556-03b8-4955-af6a-6cd62433b19d}" location="ZFS.vdi" format="VDI" type="Normal"/>
<HardDisk uuid="{3cba06b3-6a9c-4e4a-92d9-eb194c7f4bf9}" location="C:/virtualdisks/PhysicalDrive1.vmdk" format="VMDK" type="Normal"/>
<HardDisk uuid="{7a23c1cb-794a-4a0d-8dd9-c79f4e8dbeb1}" location="C:/virtualdisks/PhysicalDrive0.vmdk" format="VMDK" type="Normal"/>
<HardDisk uuid="{245c2429-7a72-4779-80e6-4885b0136ec3}" location="C:/virtualdisks/PhysicalDrive4.vmdk" format="VMDK" type="Normal"/>
<HardDisk uuid="{343ad927-45cb-41bd-88e9-13aff8f4a3b4}" location="C:/virtualdisks/PhysicalDrive5.vmdk" format="VMDK" type="Normal"/>
<HardDisk uuid="{92cae47b-ad81-4ddd-831e-ec5327541231}" location="C:/virtualdisks/PhysicalDrive7.vmdk" format="VMDK" type="Normal"/>
<HardDisk uuid="{d1496f13-f721-4fe6-ba73-e831173a731a}" location="C:/virtualdisks/PhysicalDrive8.vmdk" format="VMDK" type="Normal"/>
<HardDisk uuid="{98bcd93f-0598-4462-b6b1-bf243f11f5cd}" location="C:/virtualdisks/PhysicalDrive2.vmdk" format="VMDK" type="Normal"/>
<HardDisk uuid="{56c14272-1e69-422f-9b00-382c50beacdd}" location="C:/virtualdisks/PhysicalDrive3.vmdk" format="VMDK" type="Normal"/>
</HardDisks>
<DVDImages/>
<FloppyImages/>
</MediaRegistry>
<ExtraData>
<ExtraDataItem name="GUI/LastCloseAction" value="PowerOff"/>
<ExtraDataItem name="GUI/LastGuestSizeHint" value="720,400"/>
<ExtraDataItem name="GUI/LastNormalWindowPosition" value="1091,341,720,442"/>
<ExtraDataItem name="GUI/MiniToolBarAlignment" value="bottom"/>
<ExtraDataItem name="GUI/SaveMountedAtRuntime" value="yes"/>
<ExtraDataItem name="GUI/ShowMiniToolBar" value="yes"/>
</ExtraData>
<Hardware version="2">
<CPU count="6" hotplug="false">
<HardwareVirtEx enabled="true"/>
<HardwareVirtExNestedPaging enabled="true"/>
<HardwareVirtExVPID enabled="true"/>
<HardwareVirtExUX enabled="true"/>
<PAE enabled="true"/>
<LongMode enabled="true"/>
<HardwareVirtExLargePages enabled="true"/>
<HardwareVirtForce enabled="false"/>
</CPU>
<Memory RAMSize="3072" PageFusion="false"/>
<HID Pointing="USBMultiTouch" Keyboard="PS2Keyboard"/>
<HPET enabled="false"/>
<Chipset type="PIIX3"/>
<Boot>
<Order position="1" device="DVD"/>
<Order position="2" device="HardDisk"/>
<Order position="3" device="None"/>
<Order position="4" device="None"/>
</Boot>
<Display VRAMSize="128" monitorCount="1" accelerate3D="false" accelerate2DVideo="false"/>
<VideoCapture enabled="false" screens="18446744073709551615" horzRes="1024" vertRes="768" rate="512" fps="25"/>
<RemoteDisplay enabled="false" authType="Null" authTimeout="5000">
<VRDEProperties>
<Property name="TCP/Ports" value="3389"/>
</VRDEProperties>
</RemoteDisplay>
<BIOS>
<ACPI enabled="true"/>
<IOAPIC enabled="true"/>
<Logo fadeIn="true" fadeOut="true" displayTime="0"/>
<BootMenu mode="MessageAndMenu"/>
<TimeOffset value="0"/>
<PXEDebug enabled="false"/>
</BIOS>
<USB>
<Controllers>
<Controller name="EHCI" type="EHCI"/>
<Controller name="OHCI" type="OHCI"/>
</Controllers>
<DeviceFilters/>
</USB>
<Network>
<Adapter slot="0" enabled="true" MACAddress="0800273CF633" cable="true" speed="0" type="82540EM">
<DisabledModes>
<NAT>
<DNS pass-domain="true" use-proxy="false" use-host-resolver="false"/>
<Alias logging="false" proxy-only="false" use-same-ports="false"/>
</NAT>
<InternalNetwork name="intnet"/>
<NATNetwork name="SKYNETMINI"/>
</DisabledModes>
<BridgedInterface name="Intel(R) PRO/1000 PT Quad Port LP Server Adapter"/>
</Adapter>
<Adapter slot="1" enabled="false" MACAddress="0800273B4033" cable="true" speed="0" type="82540EM">
<DisabledModes>
<NAT>
<DNS pass-domain="true" use-proxy="false" use-host-resolver="false"/>
<Alias logging="false" proxy-only="false" use-same-ports="false"/>
</NAT>
</DisabledModes>
</Adapter>
<Adapter slot="2" enabled="false" MACAddress="0800271A84A5" cable="true" speed="0" type="82540EM">
<DisabledModes>
<NAT>
<DNS pass-domain="true" use-proxy="false" use-host-resolver="false"/>
<Alias logging="false" proxy-only="false" use-same-ports="false"/>
</NAT>
</DisabledModes>
</Adapter>
<Adapter slot="3" enabled="false" MACAddress="080027B1B682" cable="true" speed="0" type="82540EM">
<DisabledModes>
<NAT>
<DNS pass-domain="true" use-proxy="false" use-host-resolver="false"/>
<Alias logging="false" proxy-only="false" use-same-ports="false"/>
</NAT>
</DisabledModes>
</Adapter>
<Adapter slot="4" enabled="false" MACAddress="080027E53678" cable="true" speed="0" type="82540EM">
<DisabledModes>
<NAT>
<DNS pass-domain="true" use-proxy="false" use-host-resolver="false"/>
<Alias logging="false" proxy-only="false" use-same-ports="false"/>
</NAT>
</DisabledModes>
</Adapter>
<Adapter slot="5" enabled="false" MACAddress="0800275371DD" cable="true" speed="0" type="82540EM">
<DisabledModes>
<NAT>
<DNS pass-domain="true" use-proxy="false" use-host-resolver="false"/>
<Alias logging="false" proxy-only="false" use-same-ports="false"/>
</NAT>
</DisabledModes>
</Adapter>
<Adapter slot="6" enabled="false" MACAddress="080027AEDCA2" cable="true" speed="0" type="82540EM">
<DisabledModes>
<NAT>
<DNS pass-domain="true" use-proxy="false" use-host-resolver="false"/>
<Alias logging="false" proxy-only="false" use-same-ports="false"/>
</NAT>
</DisabledModes>
</Adapter>
<Adapter slot="7" enabled="false" MACAddress="080027BE53F1" cable="true" speed="0" type="82540EM">
<DisabledModes>
<NAT>
<DNS pass-domain="true" use-proxy="false" use-host-resolver="false"/>
<Alias logging="false" proxy-only="false" use-same-ports="false"/>
</NAT>
</DisabledModes>
</Adapter>
</Network>
<UART>
<Port slot="0" enabled="false" IOBase="0x3f8" IRQ="4" hostMode="Disconnected"/>
<Port slot="1" enabled="false" IOBase="0x2f8" IRQ="3" hostMode="Disconnected"/>
</UART>
<LPT>
<Port slot="0" enabled="false" IOBase="0x378" IRQ="7"/>
<Port slot="1" enabled="false" IOBase="0x378" IRQ="7"/>
</LPT>
<AudioAdapter controller="AC97" driver="DirectSound" enabled="false"/>
<RTC localOrUTC="local"/>
<SharedFolders/>
<Clipboard mode="Disabled"/>
<DragAndDrop mode="Disabled"/>
<IO>
<IoCache enabled="true" size="5"/>
<BandwidthGroups/>
</IO>
<HostPci>
<Devices/>
</HostPci>
<EmulatedUSB>
<CardReader enabled="false"/>
</EmulatedUSB>
<Guest memoryBalloonSize="0"/>
<GuestProperties>
<GuestProperty name="/VirtualBox/GuestAdd/HostVerLastChecked" value="4.3.6" timestamp="1391537384235567900" flags=""/>
<GuestProperty name="/VirtualBox/GuestAdd/Revision" value="86992" timestamp="1392007732522755001" flags=""/>
<GuestProperty name="/VirtualBox/GuestAdd/Vbgl/Video/SavedMode" value="1440x1050x32" timestamp="1391537353632181800" flags=""/>
<GuestProperty name="/VirtualBox/GuestAdd/Version" value="4.2.16" timestamp="1392007732522254901" flags=""/>
<GuestProperty name="/VirtualBox/GuestAdd/VersionExt" value="4.2.16_Ubuntu" timestamp="1392007732522755000" flags=""/>
<GuestProperty name="/VirtualBox/GuestInfo/OS/Product" value="Linux" timestamp="1392007732521254800" flags=""/>
<GuestProperty name="/VirtualBox/GuestInfo/OS/Release" value="3.11.0-12-generic" timestamp="1392007732521754800" flags=""/>
<GuestProperty name="/VirtualBox/GuestInfo/OS/Version" value="#19-Ubuntu SMP Wed Oct 9 16:20:46 UTC 2013" timestamp="1392007732521754801" flags=""/>
<GuestProperty name="/VirtualBox/HostInfo/GUI/LanguageID" value="en_US" timestamp="1396372821420676000" flags=""/>
</GuestProperties>
</Hardware>
<StorageControllers>
<StorageController name="IDE" type="PIIX4" PortCount="2" useHostIOCache="false" Bootable="true">
<AttachedDevice passthrough="false" type="DVD" port="1" device="0"/>
</StorageController>
<StorageController name="SATA" type="AHCI" PortCount="15" useHostIOCache="false" Bootable="true" IDE0MasterEmulationPort="0" IDE0SlaveEmulationPort="1" IDE1MasterEmulationPort="2" IDE1SlaveEmulationPort="3">
<AttachedDevice nonrotational="true" type="HardDisk" port="0" device="0">
<Image uuid="{5b2b1556-03b8-4955-af6a-6cd62433b19d}"/>
</AttachedDevice>
<AttachedDevice type="HardDisk" port="3" device="0">
<Image uuid="{245c2429-7a72-4779-80e6-4885b0136ec3}"/>
</AttachedDevice>
<AttachedDevice type="HardDisk" port="4" device="0">
<Image uuid="{343ad927-45cb-41bd-88e9-13aff8f4a3b4}"/>
</AttachedDevice>
<AttachedDevice type="HardDisk" port="5" device="0">
<Image uuid="{92cae47b-ad81-4ddd-831e-ec5327541231}"/>
</AttachedDevice>
<AttachedDevice type="HardDisk" port="6" device="0">
<Image uuid="{d1496f13-f721-4fe6-ba73-e831173a731a}"/>
</AttachedDevice>
<AttachedDevice nonrotational="true" type="HardDisk" port="7" device="0">
<Image uuid="{98bcd93f-0598-4462-b6b1-bf243f11f5cd}"/>
</AttachedDevice>
<AttachedDevice nonrotational="true" type="HardDisk" port="8" device="0">
<Image uuid="{56c14272-1e69-422f-9b00-382c50beacdd}"/>
</AttachedDevice>
</StorageController>
</StorageControllers>
<Groups>
<Group name="/SKYNET"/>
</Groups>
</Machine>
</VirtualBox>
The function used to create the raw vmkds:
Code: Select all
function createrawvmdk() {
# Run compmgmt.msc and choose the DiskN
# Fill in the array accordingly
for (( i=0; i<=10; i++ )); do
printf "creating PhysicalDrive${i}\t\t"
vboxmanage internalcommands createrawvmdk \
-filename "C:\\PhysicalDrive${i}.vmdk" \
-rawdisk "\\\.\\PhysicalDrive${i}" 1>>${log} 2>>${log}
[ $? -eq 0 ] && \
( printf "\r\t\t\t\t[\e[32mdone\e[0m]\n" ) || \
( printf "\r\t\t\t\t[\e[31merror\e[0m]\n"; return 1 )
done
}
Reading from the pool is introducing checksum errors and is corrupting data in a striped mirror pool.
I copied over roughly 3 TB of data into the pool with no issues. However, once I wanted to copy out a few of my datasets (e.g. moon/home/user) I got two CHECKSUM (CKSUM) errors while doing this. One was a zip file, the other a jpeg. I opened the copied image file and indeed 80% of the image was intact with the remainder of the data scrambled.
At this point I had 2 data errors and less than 10 checksum errors for the entire pool.
I decide to do a scrub with zpool scrub moon. The scrub finished with around 40 checksum errors with ~1.68M repaired. I was expecting the scrub to fix the two files, but after the scrub there are now 43 data errors. Output shown below is the filesystem running a second scrub of the pool.
Code: Select all
[root@ZFS /]# zpool status
pool: moon
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://illumos.org/msg/ZFS-8000-8A
scan: scrub in progress since Mon Mar 31 10:14:52 2014
284G scanned out of 2.43T at 61.7M/s, 10h9m to go
0 repaired, 11.44% done
config:
NAME STATE READ WRITE CKSUM
moon ONLINE 0 0 63
mirror-0 ONLINE 0 0 72
diskid/DISK-VB92cae47b-31125427p1 ONLINE 0 0 74
diskid/DISK-VBd1496f13-1a733a17p1 ONLINE 0 0 76
mirror-1 ONLINE 0 0 54
diskid/DISK-VB343ad927-b4a3f4f8p1 ONLINE 0 0 59
diskid/DISK-VB245c2429-c36e13b0p1 ONLINE 0 0 56
logs
diskid/DISK-VB98bcd93f-cdf5113fp1 ONLINE 0 0 0
cache
diskid/DISK-VB56c14272-ddacbe50p1 ONLINE 0 0 0
errors: 43 data errors, use '-v' for a list
Almost halfway through the scrub there are now 80 checksum errors in the pool.
Code: Select all
[root@ZFS /]# zpool status
pool: moon
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://illumos.org/msg/ZFS-8000-8A
scan: scrub in progress since Mon Mar 31 10:14:52 2014
1.04T scanned out of 2.43T at 73.3M/s, 5h31m to go
0 repaired, 42.75% done
config:
NAME STATE READ WRITE CKSUM
moon ONLINE 0 0 80
mirror-0 ONLINE 0 0 94
diskid/DISK-VB92cae47b-31125427p1 ONLINE 0 0 96
diskid/DISK-VBd1496f13-1a733a17p1 ONLINE 0 0 98
mirror-1 ONLINE 0 0 66
diskid/DISK-VB343ad927-b4a3f4f8p1 ONLINE 0 0 71
diskid/DISK-VB245c2429-c36e13b0p1 ONLINE 0 0 68
logs
diskid/DISK-VB98bcd93f-cdf5113fp1 ONLINE 0 0 0
cache
diskid/DISK-VB56c14272-ddacbe50p1 ONLINE 0 0 0
errors: 43 data errors, use '-v' for a list
The number of data errors dropped down to 16 from 43 when I did the following:
+ Offline all HDD and SSD in Windows
+ Booting VM resulted in a PAUSED state
+ Bring all HDD and SSD Online
+ Booted VM
Code: Select all
NAME STATE READ WRITE CKSUM
moon ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
diskid/DISK-VB92cae47b-31125427p1 ONLINE 0 0 0
diskid/DISK-VBd1496f13-1a733a17p1 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
diskid/DISK-VB343ad927-b4a3f4f8p1 ONLINE 0 0 0
diskid/DISK-VB245c2429-c36e13b0p1 ONLINE 0 0 0
logs
diskid/DISK-VB98bcd93f-cdf5113fp1 ONLINE 0 0 0
cache
diskid/DISK-VB56c14272-ddacbe50p1 ONLINE 0 0 0
errors: 16 data errors, use '-v' for a list
Edit 04/02/2014:
I looked up out ZFS handles cache flushing, and how VirtualBox handles cache flushing.
According to http://docs.oracle.com/cd/E26505_01/htm ... zfs-6.html
ZFS issues infrequent flushes (every 5 second or so) after the uberblock updates. The flushing infrequency is fairly inconsequential so no tuning is warranted here. ZFS also issues a flush every time an application requests a synchronous write (O_DSYNC, fsync, NFS commit, and so on).
According to http://www.virtualbox.org/manual/ch12.html
12.2.2. Responding to guest IDE/SATA flush requests
If desired, the virtual disk images can be flushed when the guest issues the IDE FLUSH CACHE command. Normally these requests are ignored for improved performance. The parameters below are only accepted for disk drives. They must not be set for DVD drives.
I'm going to enable cache flushing and see how that affects results
Code: Select all
<ExtraDataItem name="VBoxInternal/Devices/ahci/0/LUN#3/Config/IgnoreFlush" value="0"/>
<ExtraDataItem name="VBoxInternal/Devices/ahci/0/LUN#4/Config/IgnoreFlush" value="0"/>
<ExtraDataItem name="VBoxInternal/Devices/ahci/0/LUN#5/Config/IgnoreFlush" value="0"/>
<ExtraDataItem name="VBoxInternal/Devices/ahci/0/LUN#6/Config/IgnoreFlush" value="0"/>
<ExtraDataItem name="VBoxInternal/Devices/ahci/0/LUN#7/Config/IgnoreFlush" value="0"/>
<ExtraDataItem name="VBoxInternal/Devices/ahci/0/LUN#8/Config/IgnoreFlush" value="0"/>
Edit 04/11/2014
Kernel Panic. The VM crashed, and I'm adding this as informational.
Code: Select all
ZFS VM Crash
vm_fault: pager read error, pid 1 (init)
g_vfs_done():adap2[READ(offset=131072, length=32768)]error = 6
swap_pager: I/O error - pagein failed; blkno 1983,size 4096, error 6
vm_fault: pager read error, pid 1 (init)
swap_pager: I/O error - pagein failed; blkno 1983,size 4096, error 6
vm_fault: pager read error, pid 1 (init)
g_vfs_done():ada0p2[WRITE(offset=7281it)
g_vfs_done():ada0p2[WRITE(ofset=728129536, length=8192)]error = 6
/: got error 6 while accessing filesystem
g_vfs_done():ada0p2[READ(offset=728129536, length=8192)]error = 6
init died (signal 4, exit 0)
panic: Going nowhere without my init!
cupid = 1
KDB: stack backtrace:
#0 0xffffffff808e7dd0 at kdb_backtrace+0x60
#1 0xffffffff808af8b5 at panic+0x155
#2 0xffffffff8087ce0f at exit1+0xdbf
#3 0xffffffff808b2eef at sigexit+0xb7f
#4 0xffffffff80c78e91 at sendsig+0x5d1
#5 0xffffffff808b4583 at trapsignal+0x293
#6 0xffffffff80c8df98 at trap+0x488
#7 0xffffffff80c75392 at calltrap+0x8
Uptime: 9d0h12m0s
Dump failed. Partition too small.
The reason I'm doing this:
Prevention from bit rot, and a cooler storage of data with compression and deduplication. This is my everything-station; I use it as a server and a workstation and I like to find a balance of pushing the most utilization out of it in a single contained tower case that also looks nice. I also have no room for server racks, so this is the most compact solution.
Thanks