Trying to diagnose poor start/stop reliability with VBoxManage

Discussions related to using VirtualBox on Windows hosts.
Post Reply
rdghickman
Posts: 10
Joined: 30. Nov 2020, 14:22

Trying to diagnose poor start/stop reliability with VBoxManage

Post by rdghickman »

Hi all,

I am starting/stopping a lot of VMs (up to 60) using scripts that call out to VBoxManage. The VMs have been created prior to the script and are in a powered off state. For startup, I do 2 at a time, waiting 30 seconds between batches. For shutdown I do similar, sending an ACPI power off command via VBoxManage.
This is on a Windows host and generally I have the VBox GUI up as well. Quite frequently (usually 1/10 VMs) one of the VMs will get stuck at power on with a black screen. It is not even getting to the GRUB boot loader. On shut down very frequently one or more VMs will get stuck in "powering off" and the VBox GUI will also hang and become inoperative. If this happens then the only option is to reboot the PC, at which point the affected VMs are in "aborted" state. This seems far more likely to happen if I interact with the GUI while the operations are being performed.

I am assuming this is a VBox problem because both VBoxManage and the VBox GUI hang indefinitely if you try and interact with one of the VMs which it got stuck with powering off. This makes the whole automation process very fraught and unreliable and requires reboots. If you slowly do it one at a time by hand from the GUI there are no problems.

Has anyone got any suggestions or ideas I can try to diagnose the problem?
scottgus1
Site Moderator
Posts: 20945
Joined: 30. Dec 2009, 20:14
Primary OS: MS Windows 10
VBox Version: VirtualBox+Oracle ExtPack
Guest OSses: Windows, Linux

Re: Trying to diagnose poor start/stop reliability with VBoxManage

Post by scottgus1 »

rdghickman wrote:one of the VMs will get stuck at power on with a black screen. It is not even getting to the GRUB boot loader.
When this happens, close the Virtualbox window for the VM with the Power Off option set.

Right-click the VM in the main Virtualbox window's VM list, choose Show Log. Save the far left tab's log, zip it, and post the zip file, using the forum's Upload Attachment tab.

Also, I don't know that a simple timer will be the most optimum way to smoothly tart guests from a script. Things happen, and a recovery system s needed.

I ran backup & restart scripts on the office server for 5 VMs. I set up a subroutine script that would launch a VM, then monitor that VM for some form of feedback, whether a flag file in a host shared folder, or a guestproperty toggle, something suitable to the capabilities of the VM OS. I put a script in each VM that ran when the OS booted up and logged in, which caused the flag file or toggle. For shutdown I had another subroutine script that would send acpipowerbutton, also some keyboard scan inputs for shutdown commands, then monitor the output of 'vboxmanage showvminfo', to see if the VM shut down, as well as Virtualbox's CPU metrics to see if the guest stopped processing but the VM's Virtualbox window never closed, then send a vboxmanage poweroff.

This may be overkill for your needs, but it was built up over a few times of coming in Monday morning with the sever down because the backup routine seized up on a VM that didn't shut down... :shock:
rdghickman
Posts: 10
Joined: 30. Nov 2020, 14:22

Re: Trying to diagnose poor start/stop reliability with VBoxManage

Post by rdghickman »

At least one of the other developers has reported a fault today which appears to be a -450 (VERR_NET_DOWN) failure.
I'll attach the logs here at least.

Starting to get a bit concerned that VBox is not the right solution for what we're trying to do here. I'll see if I can make the scripts back off some more and see if that helps reduce errors further.
Attachments
VBox.zip
-450 error
(71.13 KiB) Downloaded 1 time
rdghickman
Posts: 10
Joined: 30. Nov 2020, 14:22

Re: Trying to diagnose poor start/stop reliability with VBoxManage

Post by rdghickman »

Additionally here is the log for where it gets stuck on boot.

The process is this:
- VM is cloned
- VM is started
- Some network settings are changed on the VM
- The VM is rebooted by sending a "reboot" command to the Linux guest OS
- The guest VM goes to a black screen and nothing else happens (waited > 5 minutes)

The last part of the log is:

00:01:47.592093 TM: Switching TSC mode from 'Dynamic' to 'RealTscOffset'
00:01:48.579465 GIM: KVM: Enabled wall-clock struct. at 0x000000002f9e8008 - u32Sec=1606921375 u32Nano=660980153 uVersion=2
00:01:48.601670 PIT: mode=2 count=0x12a5 (4773) - 249.98 Hz (ch=0)
00:01:48.602326 APIC0: Switched mode to x2APIC
00:01:48.733144 PIT: mode=0 count=0x10000 (65536) - 18.20 Hz (ch=0)
00:01:48.843711 OHCI: Software reset

You can then see I power it off to get the logs more than five minutes later:

00:01:48.843711 OHCI: Software reset
00:06:59.972369 Console: Machine state changed to 'Stopping'
Attachments
sim01-2020-12-02-15-08-07.zip
Guest reboot failure
(27.83 KiB) Downloaded 3 times
Post Reply