When spawning large number of VMs, some VMs do not find their root upon reboot

Discussions related to using VirtualBox on Linux hosts.
Post Reply
ElCoyote_
Posts: 7
Joined: 6. Jun 2016, 19:00

When spawning large number of VMs, some VMs do not find their root upon reboot

Post by ElCoyote_ »

Hi everyone,

I'm using VBox on several large hypervisors to spawn RHEL7 VM's.
The large hypervisors have RAID cards and run RHEL6 or 7 with a fast filesystem (xfs, vxfs).
I have 128Gb-192Gb RAM on these boxes and 2*6-cores Xeon V2/V3 cpus.
I am using VBox 5.1.18 x86_64.

Here's what I am seeing:
when spawning a large (6+) number of RHEL7 VMs (2-4vcpus, 12-24gb, 8 virtual disks) on my hypervisors, things go well as the nodes proceed to download a boot image.
The VMs boot fine -but- upon the next warm reboot (while other nodes are still deploying and some I/O is in flight) I frequently see this type of failure:

Kernel panic - not synching:VFS:Unable to mount root fs on unknown-block(0,0)

Every time, I am able to power off and on the VM, which will then proceed to boot fine.
In the RHEL7 guests, I tried adding 'scsi_mod.scan=sync rootdelay=15' to the kernel cmdline.. This helps a bit but not reliably.
in VBox, I tried enabling/disabling Host I/O cache but it did not make a difference.
I am now testing VBoxManage setextradata "VM name" "VBoxInternal/Devices/ahci/0/LUN#[x]/Config/IgnoreFlush" 0

Are there any tips to make initramfs root enablement more reliable for my RHEL7 VMs?
I found people reporting similar issues (viewtopic.php?f=20&t=45245) but Host I/O didn't make a difference for my setups.

Thanks for reading,
Vincent
ElCoyote_
Posts: 7
Joined: 6. Jun 2016, 19:00

is there a setting to 'hardreset' the VM when it reboots?

Post by ElCoyote_ »

Hi everyone,
Is there a setting in VBox to force a hardreset of a VM when it goes through an OS-Initiated reboot?
This would a workaround for the issue of VMs not finding their root disk when host is under load.
Thanks for reading,
Vincent
socratis
Site Moderator
Posts: 27329
Joined: 22. Oct 2010, 11:03
Primary OS: Mac OS X other
VBox Version: PUEL
Guest OSses: Win(*>98), Linux*, OSX>10.5
Location: Greece

Re: When spawning large number of VMs, some VMs do not find their root upon reboot

Post by socratis »

Vincent,
I merged the two topics since they were in a sense the same problem (you were actually referencing the original thread in the second one).

Your setup is not common, to say the least, that's why there were not enough responses (actually none) to your first post. If I can summarize my understanding of the problem:
  • You have several guests, all of them booting simultaneously. I won't even bother to ask you if you've calculated the total number of physical cores (not the logical ones) and the amount of RAM for your guests, and that you've left enough breathing room for your host to do all the heavy I/O. I will assume that you've done your basic math.
  • You talk about "the next warm reboot" of the VMs. Do you reboot all of them at the same time? Do they have to be synchronized? Why?
  • Where do you believe that this error is coming from? Due to heavy I/O on the host, which would appear as slow I/O on the guest, which times-out?
  • Why the VMs are not finding their root disk?
  • Why do you believe that enabling/forcing a hard-reset would change the situation?
As you can tell, I have more questions than answers at this point, mainly because of the uncommon nature of your setup. If I better understand the program, maybe I can offer a solution.

For the record, a hard-reset would not be controlled by your guest, but by your host, most probably via a script. But for more on that, on our next post ;)
Do NOT send me Personal Messages (PMs) for troubleshooting, they are simply deleted.
Do NOT reply with the "QUOTE" button, please use the "POST REPLY", at the bottom of the form.
If you obfuscate any information requested, I will obfuscate my response. These are virtual UUIDs, not real ones.
ElCoyote_
Posts: 7
Joined: 6. Jun 2016, 19:00

Re: When spawning large number of VMs, some VMs do not find their root upon reboot

Post by ElCoyote_ »

Hi,
I got better results (I think) for my VMs by adding this argument to their GRUB2 command-line:
"reboot=cold,pci"
I'm now getting less crashes upon reboot but surely it would be better if I could do this through a flag in VBox's extradata.
Cheers,
Vincent
ElCoyote_
Posts: 7
Joined: 6. Jun 2016, 19:00

Re: When spawning large number of VMs, some VMs do not find their root upon reboot

Post by ElCoyote_ »

Hi Socratis,
Yes, that's a good description of my setup. There's one major difference, though:

- Although I exceed the number of my pcpus (12physical, 24with HT) in vcpus, the VMs aren't (re)booting at the same time..
In fact, due to randomization, reboots happen with an interval of 10-50seconds. Also, an I/O storm will come once the VM finishes it's reboot as it will proceed to move about 30Gb of data from one disk to another.

I wouldn't consider it an issue within VBox if the workaround of powering off/powering back on the VM worked flawlessy on a VM that had a kernel panic because they couldn't find their root.
I don't care if my VMs hardreset, I just want them to reboot automagically.. like physical nodes do.

Thanks
socratis
Site Moderator
Posts: 27329
Joined: 22. Oct 2010, 11:03
Primary OS: Mac OS X other
VBox Version: PUEL
Guest OSses: Win(*>98), Linux*, OSX>10.5
Location: Greece

Re: When spawning large number of VMs, some VMs do not find their root upon reboot

Post by socratis »

I would really appreciate it and it would really help you if you could answer the questions that I asked you for clarification. I still don't understand what you're doing and why. I can not help you if I can't begin to understand the problem.
Do NOT send me Personal Messages (PMs) for troubleshooting, they are simply deleted.
Do NOT reply with the "QUOTE" button, please use the "POST REPLY", at the bottom of the form.
If you obfuscate any information requested, I will obfuscate my response. These are virtual UUIDs, not real ones.
JEBjames
Posts: 58
Joined: 26. Jan 2017, 18:27
Primary OS: MS Windows other
VBox Version: OSE other
Guest OSses: Centos, Ubuntu, Debian, Various Windows
Contact:

Re: When spawning large number of VMs, some VMs do not find their root upon reboot

Post by JEBjames »

I'm not sure if I hit the same issue, but it sounds similar.

I haven't had any crashes like this in a few weeks. I saved the logs the last time it happened (see attached zip file).

I occasionally get an "end Kernel panic -- not synching: VFS: Unable to mount root fs on unknown-block(0,0)" message

At first I thought it was an Ubuntu 16.04.2 issue with the beta 4.8 LTS kernel. But I've also hit it with the 4.4 kernel on Ubuntu 16.04.2. And twice on Debian 8.

It happens very rarely...but when it does the following usually happened right before:
- I might have mounted an ISO file and stop/started the vm (or was lazy and did a reset), and did an operating system reinstall (perhaps did this multiple times)
- it always happens after a "reboot" command from the graphical terminal
- the crashes occurs AFTER the grub boot menu (so the disk is clearly accessible)...but before the full OS boots.
- usually the thing I was doing right before my reboot was an apt update/apt upgrade/or my tweak script (which does an apt update+ upgrade, disables a bunch of services).

When it's crashed, if I do a "reset" from the Virtualbox menu it will almost always do the same crash. Once or twice a reset booted, but at this point networking didn't work at all???

In all cases, if I power off and power on the vm again everything is good. So that's what I usually do.

I've had this problem on two different Windows 10 host computers. The second computer is just using the stock Windows 10 built-in security tools.

I haven't had any problems in a few weeks though.
Attachments
Logs-crash-again-after-gui-apt-update-and-reboot.7z
(73.81 KiB) Downloaded 5 times
VirtualBox_Ubuntu 16.04 - 2017-Jan TV-Build_21_02_2017_23_59_29.png
VirtualBox_Ubuntu 16.04 - 2017-Jan TV-Build_21_02_2017_23_59_29.png (14.86 KiB) Viewed 1427 times
VirtualBox_Ubuntu 16.04 - 2017-Jan TV-Build_17_03_2017_17_41_51-gui-apt-update-crash-on-reboot.png
VirtualBox_Ubuntu 16.04 - 2017-Jan TV-Build_17_03_2017_17_41_51-gui-apt-update-crash-on-reboot.png (14.21 KiB) Viewed 1427 times
Post Reply