raid-check crashes client VirtualMachine

Discussions related to using VirtualBox on Linux hosts.
CidiRome
Posts: 11
Joined: 18. Apr 2018, 23:12

raid-check crashes client VirtualMachine

Post by CidiRome »

Hi.

For a while that I was noticing that my client virtual machine that is running on VBOX Linux was crashing every Sunday.

I took me 3 weeks to figure out what seems to be the origin of the problem:
the script/usr/sbin/raid-check is the only cron job that only runs on Sunday and I believe that it is the Culprit.

The host system is a CentOS Linux release 7.4.1708 (Core) running on a RAID 1 of two 6Tb HDDs.

Examples of VBOX messages that I believe are related to the situation (There are plenty):
164:57:20.936731 LsiLogic#0: Guest issued CDB {0x12, 0x0, 0x0, 0x0, 0x24, 0x0}
164:57:59.994807 AsyncCompletion: Task 0x007f99f55d4800 completed after 25 seconds
164:57:59.994834 AsyncCompletion: Task 0x007f99f55e4ec0 completed after 25 seconds
165:10:55.990517 VD#0: Write request was active for 38 seconds
165:10:55.990525 VD#0: Write request was active for 38 seconds
Because of the log timestamp style I cannot be 100% that this entries happened at the same time, but I believe soo...

How can I prevent this problem.

Cheers.
socratis
Site Moderator
Posts: 27329
Joined: 22. Oct 2010, 11:03
Primary OS: Mac OS X other
VBox Version: PUEL
Guest OSses: Win(*>98), Linux*, OSX>10.5
Location: Greece

Re: raid-check crashes client VirtualMachine

Post by socratis »

Post a ZIPPED VBox.log from a run that includes the crash. Start the VM from cold-boot and wait for it to crash. It would be best if you did it a little before the anticipated crash, it's not worth it shifting through a 1 week of log messages.
Do NOT send me Personal Messages (PMs) for troubleshooting, they are simply deleted.
Do NOT reply with the "QUOTE" button, please use the "POST REPLY", at the bottom of the form.
If you obfuscate any information requested, I will obfuscate my response. These are virtual UUIDs, not real ones.
CidiRome
Posts: 11
Joined: 18. Apr 2018, 23:12

Re: raid-check crashes client VirtualMachine

Post by CidiRome »

Hi.

Here is the log.

Please note that the machine doesn't completely crashes, but the mount point becomes RO and nothing can work since that.

Deleted a few hundred of this
06:06:45.621748 14:02:53.529506 AsyncCompletion: Task 0x007f35e46b0240 completed after 24 seconds
To be able to fit the log in a file of 256KiB

Even after the restarting the machine the erros in VBox.log keep comming, I suspect that this happens because the raid-check script must still be running.
Added later: in fact the machine is already crashed again with the FS ReadOnly again. Added the small log of this short run untiI I ordered the reboot.

Cheers.
Attachments
VBox_short.log.zip
Short machine run Log.
(46.33 KiB) Downloaded 14 times
PrintScreen Of the Crashed Machine.
PrintScreen Of the Crashed Machine.
VirtualBox_PrintScreen.png (48.02 KiB) Viewed 5258 times
VBox2.log.zip
The VBox.log since about 17 Hours before the crash.
(216.4 KiB) Downloaded 13 times
CidiRome
Posts: 11
Joined: 18. Apr 2018, 23:12

Re: raid-check crashes client VirtualMachine

Post by CidiRome »

Hi.

One more week has passed and I still have the same problem.

I've found some information about this situation here and there, but nothing precise.

Before this last check I changed the option "Nice" of raid-check to IDLE, but the result is the same.

The check takes about 11 hours to conclude and if I reboot the affected machines before it end, normally thy crash again, but I've noticed that normally they don't crash at the beginning of the raid-check, it tends to be after 50%.

I don't notice other effects on other things on the server, only vBox seems to be affected... Is it somehow the way vBox accesses the vdi/vmdk files that causes the problem?

Cheers.
socratis
Site Moderator
Posts: 27329
Joined: 22. Oct 2010, 11:03
Primary OS: Mac OS X other
VBox Version: PUEL
Guest OSses: Win(*>98), Linux*, OSX>10.5
Location: Greece

Re: raid-check crashes client VirtualMachine

Post by socratis »

CidiRome wrote:the mount point becomes RO and nothing can work
Now, that rings a bell... I believe we've had a couple of cases like that, all with Linux guests. There's something wrong with the kernel, which if it detects long delays, it assumes that the hard drive is "dying" and mounts it as a read-only to prevent further damage. But for the life of me, I can't find the topics! :?

Go to the VM Settings » Storage » Controller: <any/all> » Use Host I/O Cache: disable/uncheck that. See if that improves the situation...
Do NOT send me Personal Messages (PMs) for troubleshooting, they are simply deleted.
Do NOT reply with the "QUOTE" button, please use the "POST REPLY", at the bottom of the form.
If you obfuscate any information requested, I will obfuscate my response. These are virtual UUIDs, not real ones.
CidiRome
Posts: 11
Joined: 18. Apr 2018, 23:12

Re: raid-check crashes client VirtualMachine

Post by CidiRome »

socratis wrote:Go to the VM Settings » Storage » Controller: <any/all> » Use Host I/O Cache: disable/uncheck that. See if that improves the situation...
The I/O cache was already disabled for the HDD controller. I can't remember if it was me that disabled it for testing purposes (probably seen the recommendation somewhere) even believing that by my understanding it would be worse...

For now I turned it back on.

Any more suggestions?

Cheers.
socratis
Site Moderator
Posts: 27329
Joined: 22. Oct 2010, 11:03
Primary OS: Mac OS X other
VBox Version: PUEL
Guest OSses: Win(*>98), Linux*, OSX>10.5
Location: Greece

Re: raid-check crashes client VirtualMachine

Post by socratis »

CidiRome wrote:Any more suggestions?
I wish I could remember what the solution was off the top of my head, but I think it's kernel related, nothing to do with VirtualBox. :?

Remember, just because an OS/program that runs in the context of VirtualBox has a problem, it doesn't make it a VirtualBox problem necessarily. You're having an issue that has most probably nothing to do with VirtualBox, so my suggestion would be to treat it as such, as a native problem with the OS or the application of the guest.

For example, a quick/dirty search for "linux file system became read only" returns about 236 K results. :shock:
Do NOT send me Personal Messages (PMs) for troubleshooting, they are simply deleted.
Do NOT reply with the "QUOTE" button, please use the "POST REPLY", at the bottom of the form.
If you obfuscate any information requested, I will obfuscate my response. These are virtual UUIDs, not real ones.
CidiRome
Posts: 11
Joined: 18. Apr 2018, 23:12

Re: raid-check crashes client VirtualMachine

Post by CidiRome »

Hi.

I will search about that.

Just to clarify, this same Virtual Machine has been running for 9 Years on VMware on the previous server (Centos 5 without RAID), the problems just started recently after being migrated to the new server with VBox and seem to be related to RAID in conjunction with VBOX.

Also, I have another Virtual Machine running in the same host that is mostly IDLE and normally don't crash (probably because it is IDLE without traffic), but I'm almost certain that it has crashed at least once.

Cheers.
andyp73
Volunteer
Posts: 1631
Joined: 25. May 2010, 23:48
Primary OS: Mac OS X other
VBox Version: PUEL
Guest OSses: Assorted Linux, Windows Server 2012, DOS, Windows 10, BIOS/UEFI emulation

Re: raid-check crashes client VirtualMachine

Post by andyp73 »

CidiRome wrote:the problems just started recently after being migrated to the new server with VBox and seem to be related to RAID in conjunction with VBOX
How is your RAID array created on the host? Do you have a genuine hardware RAID controller or do you have a software RAID created either through a BIOS extension or within Linux?

I have a server that has a 4 disk, RAID 10, array created through the Intel RAID BIOS extension which just sits there and contains the .vdi files for VirtualBox along with all the rest of staff data without any issues.

-Andy.
My crystal ball is currently broken. If you want assistance you are going to have to give me all of the necessary information.
Please don't ask me to do your homework for you, I have more than enough of my own things to do.
CidiRome
Posts: 11
Joined: 18. Apr 2018, 23:12

Re: raid-check crashes client VirtualMachine

Post by CidiRome »

Hi.

I don't fully understand it (or at least recall now), but I believe it is software RAID even though the array (if I recall correctly) has been created in the RAID BIOS, the mainboard is a supermicro without any additional RAID controller.
Manufacturer: Supermicro
Product Name: X10SRL-F
Version: 1.01B

cat /proc/mdstat
Personalities : [raid1]
md126 : active raid1 sda[1] sdb[0]
5567493120 blocks super external:/md127/0 [2/2] [UU]

md127 : inactive sdb[1](S) sda[0](S)
6306 blocks super external:imsm

lspci | grep -i raid
00:1f.2 RAID bus controller: Intel Corporation C600/X79 series chipset SATA RAID Controller (rev 05)
This night I've run the raid-check manually for testing and it has run all the way without crashing the VBOX machine.
The differences from last Sunday are:
- It was run manually;
- It is not Sunday;
- The I/O cache is now enable;

Cheers.
socratis
Site Moderator
Posts: 27329
Joined: 22. Oct 2010, 11:03
Primary OS: Mac OS X other
VBox Version: PUEL
Guest OSses: Win(*>98), Linux*, OSX>10.5
Location: Greece

Re: raid-check crashes client VirtualMachine

Post by socratis »

CidiRome wrote:- The I/O cache is now enable;
That might be the fix actually. Let it run automagically on a Sunday, but with the I/O cache enabled. See what happens...
Do NOT send me Personal Messages (PMs) for troubleshooting, they are simply deleted.
Do NOT reply with the "QUOTE" button, please use the "POST REPLY", at the bottom of the form.
If you obfuscate any information requested, I will obfuscate my response. These are virtual UUIDs, not real ones.
CidiRome
Posts: 11
Joined: 18. Apr 2018, 23:12

Re: raid-check crashes client VirtualMachine

Post by CidiRome »

socratis wrote:That might be the fix actually. Let it run automagically on a Sunday, but with the I/O cache enabled. See what happens...
I was already thinking about that.

Next Sunday it will run by itself and I will see, although I'm a little skeptic because I believed that it was me who disabled it when the problem appeared.

I suppose I will make the prognostics after the game...

Cheers.
Yoda
Posts: 80
Joined: 4. Feb 2008, 19:16

Re: raid-check crashes client VirtualMachine

Post by Yoda »

I'm having the exact same problem - actually had it for years. Previously the crash only happened once a month but always on raid-check time. After upgrade from Fedora 20 -> 27, it happens every week.

Previously, it usually showed up as either a triple fault, or a guru mediation - now just a regular crash (see bug #17721).

Raid is Intel BIOS RAID 5 on 3 disks. Crash always happens after about 80% of the check is done. I/O buffer is on - will try OFF. Using IDE controller in guest.

I changed raid-check to run every day now, for more testing of this.
Last edited by socratis on 2. May 2018, 00:07, edited 1 time in total.
Reason: Added missing URL.
CidiRome
Posts: 11
Joined: 18. Apr 2018, 23:12

Re: raid-check crashes client VirtualMachine

Post by CidiRome »

Good news.

Today the Virtual Machine didn't crash.... Yupi.

Let's hope that this keeps this way.

Yet, there is another thing I remembered:
This instance of the machine was started by the GUI, while at least some of the problematic ones were started by the vboxautostart-service.
I don't think it is related to the head-less status, because I saw it happen even the display open.

For now I will go with the theory that the I/O cache enabled solved the problem even while thinking it was already on before when it crashed (maybe I'm wrong).

Cheers.
Last edited by CidiRome on 7. May 2018, 10:13, edited 1 time in total.
socratis
Site Moderator
Posts: 27329
Joined: 22. Oct 2010, 11:03
Primary OS: Mac OS X other
VBox Version: PUEL
Guest OSses: Win(*>98), Linux*, OSX>10.5
Location: Greece

Re: raid-check crashes client VirtualMachine

Post by socratis »

Good news! Please come back in a week or a month with an update if you can, so we can have a higher degree of certainty about the resolution...
Do NOT send me Personal Messages (PMs) for troubleshooting, they are simply deleted.
Do NOT reply with the "QUOTE" button, please use the "POST REPLY", at the bottom of the form.
If you obfuscate any information requested, I will obfuscate my response. These are virtual UUIDs, not real ones.
Post Reply