CLI startvm headless hangs with Fedora and Redhat guests

Discussions about using Linux guests in VirtualBox.
lelegard
Posts: 15
Joined: 14. Sep 2018, 11:49
Primary OS: MS Windows 10
VBox Version: PUEL
Guest OSses: Linux, BSD, Windows
Location: Paris, France

CLI startvm headless hangs with Fedora and Redhat guests

Post by lelegard »

Hello,

This is a follow-up to topic 111025 with a more accurate description of the causes, deserving a new topic and new title.

Configuration : VBox 7.0.14 on a laptop with 14 cores, 32 GB, Windows 11. Installed 14 VM. Guests: Ubuntu, Debian, Mint, Fedora, Redhat (Rocky Linux), openSUSE Leap, openSUSE Tumbleweed, Alpine, Arch, Gentoo, FreeBSD, OpenBSD, NetBSD, DragonFlyBSD.

All VM's work fine when started from the VBox GUI.

However, with Fedora and Redhat guests, the boot never completes when started from the command line in headless mode, eg:

Code: Select all

VBoxManage startvm vmiredhat --type=headless
The type of problem may vary: 1) Black screen when starting gnome, 2) CPU soft lockup error, 3) Stuck on first line of display after OS loaded (after grub). I see this in the small snapshot window from the VBox GUI. The guest is never accessible using ssh (the second network adapter is host-only).

More interesting: I tried to start the two VM's from the command line without "--type=headless". As long as I leave the VM window in the background, with focus on other windows, a local terminal, the VBox GUI, the boot hangs. I see the same state in the small snapshot window from the VBox GUI. However, when I pop up the VM window in the foreground and the "mouse integration" message appears, then the boot completes.

Most other VM's work when started using the same command in headless mode. I haven't tested them all but Ubuntu, Debian, Mint work. The issue seems linked to the Redhat family.

It seems that there is something special in Fedora and Redhat which blocks the boot as long as some real input device is not available.

All VM have identical configurations. I have attached the description ("VBoxManage showvminfo") of the Fedora and Redhat VM. They differ only in file names, UUID, MAC address and dates.

This is a dev system for an open source project. The VM's are used to build binaries. So, I need scripting for automation (boot, build, get binaries, shutdown). This is why being able to start and stop VM's from the command line is essential.

Any idea would be appreciated.
Attachments
vmiredhat.txt
(3.5 KiB) Downloaded 43 times
vmifedora.txt
(3.5 KiB) Downloaded 31 times
lelegard
Posts: 15
Joined: 14. Sep 2018, 11:49
Primary OS: MS Windows 10
VBox Version: PUEL
Guest OSses: Linux, BSD, Windows
Location: Paris, France

Re: CLI startvm headless hangs with Fedora and Redhat guests

Post by lelegard »

Here is a more complete status of what works and what does not.

Command "VBoxManage startvm vmname --type=headless", from PowerShell and bash (Cygwin and MinGW).

Boot hangs, unblocked when the VM windows is opened and get focus
=> Redhat (Rocky), openSUSE Leap, openSUSE Tumbleweed, Arch Linux

Boot hangs, CPU soft lockup, neven unblocked, even when the VM windows is opened and get focus
=> Fedora

Boot completes, Gnome started, ping/ssh impossible until the VM windows is opened and get focus
=> Gentoo

Headless boot ok and allow ssh
=> Ubuntu, Debian, Mint, Alpine, FreeBSD, OpenBSD, NetBSD, DragonFlyBSD

As a summary, starting a VM headless from the command line works on 4 Linux and 4 BSD and fails on 6 Linux.

Conclusion: VirtualBox is useless in an automation environment. Could VirtualBox support guys help?
scottgus1
Site Moderator
Posts: 20945
Joined: 30. Dec 2009, 20:14
Primary OS: MS Windows 10
VBox Version: PUEL
Guest OSses: Windows, Linux

Re: CLI startvm headless hangs with Fedora and Redhat guests

Post by scottgus1 »

I wouldn't know how to diagnose this myself. You'd need to confirm that each host OS is using the same version of Official Virtualbox from www.virtualbox.org, not the Linux distro's fork. Also confirm the results that happen when you use a regular terminal. (The last post mentions PowerShell, but discusses Linux hosts: PowerShell is a Windows thing.)

Once you confirm this, then this detail of information would be good on the Bugtracker.
lelegard
Posts: 15
Joined: 14. Sep 2018, 11:49
Primary OS: MS Windows 10
VBox Version: PUEL
Guest OSses: Linux, BSD, Windows
Location: Paris, France

Re: CLI startvm headless hangs with Fedora and Redhat guests

Post by lelegard »

Please read the configuration paragraph : There is one single Windows 11 host. Everything is on this specific host. There are many guests but one single host. The VBoxManage command line is indifferently run from a PowerShell window, a Cygwin bash window or a MinGW bash window.

VirtualBox is "Version 7.0.14 r161095 (Qt5.15.2)", downloaded from virtualbox.org.
fth0
Volunteer
Posts: 5678
Joined: 14. Feb 2019, 03:06
Primary OS: Mac OS X other
VBox Version: PUEL
Guest OSses: Linux, Windows 10, ...
Location: Germany

Re: CLI startvm headless hangs with Fedora and Redhat guests

Post by fth0 »

Please reproduce the issue and wait for a defined amount of time (e.g. 2 minutes) before opening the VM window, so that we can identify the position in the log files later on. Then provide a zip file with VBox.log files and system log files from some of the guest OSes (e.g. Ubuntu, RedHat, Fedora, Gentoo). The basic idea is to have one example of each type of behavior you want us to look at (incl. the expected behavior as a baseline for comparison). Did you already look at the system log files to find out what happened after the wait?

As a general idea, VirtualBox doesn't support Wayland natively, only X11 (e.g. XWayland).

PS: We aren't VirtualBox support guys, BTW. Just fellow VirtualBox users. ;)
scottgus1
Site Moderator
Posts: 20945
Joined: 30. Dec 2009, 20:14
Primary OS: MS Windows 10
VBox Version: PUEL
Guest OSses: Windows, Linux

Re: CLI startvm headless hangs with Fedora and Redhat guests

Post by scottgus1 »

lelegard wrote: 11. Feb 2024, 22:10 There is one single Windows 11 host.
Oh, OK, I got the wrong end of that one. I thought those were all the hosts you were trying. Please disregard my input and follow fth0's.
lelegard
Posts: 15
Joined: 14. Sep 2018, 11:49
Primary OS: MS Windows 10
VBox Version: PUEL
Guest OSses: Linux, BSD, Windows
Location: Paris, France

Re: CLI startvm headless hangs with Fedora and Redhat guests

Post by lelegard »

Ticket 21985 was open. Configuration and log files attached.

Concerning X11 vs. wayland, please consider that 1) the full Gnome desktop works when started from the VBox UI, 2) the boot hangs at early stages, way before X11 starts, when started headless.
lelegard
Posts: 15
Joined: 14. Sep 2018, 11:49
Primary OS: MS Windows 10
VBox Version: PUEL
Guest OSses: Linux, BSD, Windows
Location: Paris, France

Re: CLI startvm headless hangs with Fedora and Redhat guests

Post by lelegard »

Here are the VBox logs of the Redhat VM. Booted headless from the command line. Waited 2 minutes. Not reachable using ping. Then opened the VM window. Boot continued and the guest becomes reachable.

The host machine has plenty of CPU's and memory. The VM have 6 CPU's and 8 GB RAM. Booting that Redhat VM using the VBox GUI takes only 15 seconds, up to the Gnome session fully opened (using autologin).

Generally speaking, booting in headless mode is excessively slow, even when the boot completes. It can take up to 1 minute while the same boot from the GUI only take 15 seconds.
Attachments
Logs.zip
VM Logs
(72.02 KiB) Downloaded 39 times
fth0
Volunteer
Posts: 5678
Joined: 14. Feb 2019, 03:06
Primary OS: Mac OS X other
VBox Version: PUEL
Guest OSses: Linux, Windows 10, ...
Location: Germany

Re: CLI startvm headless hangs with Fedora and Redhat guests

Post by fth0 »

Thanks for the log files and the thorough description! :)

The VBoxHardening.log file didn't indicate any unexpected issue to me. Regarding the VBox.log file, I noticed the following (in the order of appearance):

First of all, Hyper-V was active on the host, so VirtualBox couldn't use VT-x directly. This has created different issues for different VirtualBox users so far, including severe performance degradation. If you can (security-wise) afford to disable Hyper-V on the host, even if it was only for a test, perhaps some of your issues would vanish immediately.

You're using a 13th Gen Intel CPU, and VirtualBox doesn't influence on which CPU cores its threads are being scheduled. In consequence, it could happen that VirtualBox's threads run on the E-cores. If that's the case, you could influence the CPU affinity when starting a VM with "start VBoxHeadless".

The VirtualBox Guest Additions (GA) were successfully started inside the guest OS around 00:01:04, which was earlier than your opening of the VM window around 00:02:09. This is interesting in so far that the VM wasn't completely hanging. In addition to that, the display resolution changed to 1280x800 around 00:01:28. In case you're wondering, the virtual display(s) for the guest OS also exist in headless mode, as you may already have guessed from seeing the Preview in the VirtualBox Manager. ;)

BTW, did you check any system log files in the guest OS (hint: you could perhaps add the virtual disk image as a secondary disk to another working VM)?

If you suspect anything regarding (virtual) mouse input, you could change the mouse to PS/2 in the VM configuration.

Right after you opened the VM windows, around 00:02:10, some MSR accesses failed, indicating that the guest OS tried to read Intel RAPL details, which have to do with power consumption. I'm not sure if it makes a difference if Hyper-V is enabled on the host, because each hypervisor decides for itself which CPU MSRs to provide to a VM or not.

To sum it up: Eliminating Hyper-V and checking the guest OS's system log files would be my primary suggestions. Please let us know what you find out, I'm very interested to hear that. :)
scottgus1
Site Moderator
Posts: 20945
Joined: 30. Dec 2009, 20:14
Primary OS: MS Windows 10
VBox Version: PUEL
Guest OSses: Windows, Linux

Re: CLI startvm headless hangs with Fedora and Redhat guests

Post by scottgus1 »

There might be some possibilities re the performance/economy cores here: viewtopic.php?f=6&t=108745
lelegard
Posts: 15
Joined: 14. Sep 2018, 11:49
Primary OS: MS Windows 10
VBox Version: PUEL
Guest OSses: Linux, BSD, Windows
Location: Paris, France

Re: CLI startvm headless hangs with Fedora and Redhat guests

Post by lelegard »

Thanks a lot for your help and your time.

There are two topics here: Hyper-V and P-core/E-core architecture.

Hyper-V

I wondered why you mentioned that Hyper-V was active. It is not. More precisely, I temporarily installed it to test if it could be a better alternative to VirtualBox. I realized that it probably needed specially tweaked distros for guests because a VM was not even able to boot the standard Ubuntu installation ISO. I concluded it was some other Microsoftish non-standard stuff, not really crap, but not appropriate to install the wide variety of distros I need. So, I removed it, rebooted. All tests from this thread were performed after that reboot.

When I open the "Windows Features" configuration panel, I can see that Hyper-V is not installed. I see that "Virtual Machine Platform" and "Windows Hypervisor Plaftorm" are still there. I do not know what they are. Admin software on top of Hyper-V? If Hyper-V itself is not installed, they should not prevent VT-x from being used by VBox. Anyway, I removed them and rebooted.

After reboot, I emptied the Logs directory on one of the VM and started it. In VBox.log, I see this:

Code: Select all

00:00:00.763848 NEM: Adjusting APIC configuration from X2APIC to APIC max mode.  X2APIC is not supported by the WinHvPlatform API!
00:00:00.763849 NEM: Disable Hyper-V if you need X2APIC for your guests!
00:00:00.763963 NEM:
00:00:00.763963 NEM: NEMR3Init: Snail execution mode is active!
00:00:00.763963 NEM: Note! VirtualBox is not able to run at its full potential in this execution mode.
00:00:00.763963 NEM:       To see VirtualBox run at max speed you need to disable all Windows features
00:00:00.763963 NEM:       making use of Hyper-V.  That is a moving target, so google how and carefully
00:00:00.763963 NEM:       consider the consequences of disabling these features.
00:00:00.763963 NEM:
00:00:00.763976 CPUM: No hardware-virtualization capability detected
I assume that this is why you said that Hyper-V was active. But it has been uninstalled and the host rebooted several times since then.
  • How to check that Hyper-V is no longer active, really? Outside VBox logs, I mean.
  • If there are some remains of Hyper-V which prevent VBox from using the HW virtualization features, how to clean them?
However, the problems I described in this thread were already there before I tried Hyper-V. I tried it precisely because of these problems with VBox. So, even if I agree that it can only be better not having Hyper-V installed, it was not the root cause of the problems in the first place.

Alder Lake P-core/E-core architecture

The Alder Lake P-core/E-core architecture of the i7-13700H Gen13 is a good idea. I did not think about it. I tried the same config on an older Windows 10 laptop with 4 homogeneous i7 cores. The same Fedora guest configuration boots in 19 seconds, headless, up to Gnome session (autologin). So, booting Fedora headless on a Windows host was not the only reason.

In the Windows Task Manager, using the per logical processor view, CPU 0 to 11 are P-cores (6 cores with hyperthreading) and CPU 12 to 19 are E-cores (8 cores, no hyperthreading).

I see that a VBox headless boot uses the E-cores. We can see that the usage of these CPU's is climbing quite fast while the P-cores stay quiet.

Interestingly, when the Fedora guest is stuck early in the boot with "soft lockup - CPU stuck" errors, we see that 4 of the E-cores are approximately 30 to 40% busy. Something in VBox seems looping.

Another observation which surprises me: while the VM are configured with 6 processors, only 4 E-cores are busy. The CPU allocation also changes from time to time. Typically, CPU 12-15 are busy, 16-19 are idle. After a couple of minutes, CPU 16-19 become busy and 12-15 idle, again and again.

When starting a VM interactively from the VBox GUI, the activity move back and forth between P-cores and E-cores. The peaks seem to run on the P-cores. I assume that this is the expected behaviour. This may explain why the headless boots, even on the distros where it works, was much slower that boots from the GUI.

Therefore, I disable power-throttling (ie. migration to E-cores) for the most important VBox programs:

Code: Select all

powercfg /powerthrottling disable /path 'C:\Program Files\Oracle\VirtualBox\VBoxHeadless.exe'
powercfg /powerthrottling disable /path 'C:\Program Files\Oracle\VirtualBox\VirtualBoxVM.exe'
powercfg /powerthrottling disable /path 'C:\Program Files\Oracle\VirtualBox\VirtualBox.exe'
powercfg /powerthrottling disable /path 'C:\Program Files\Oracle\VirtualBox\VBoxNetDHCP.exe'
powercfg /powerthrottling disable /path 'C:\Program Files\Oracle\VirtualBox\VBoxNetNAT.exe'
powercfg /powerthrottling disable /path 'C:\Program Files\Oracle\VirtualBox\VBoxSVC.exe'
Indeed, a headless boot now runs at the speed of light, same time as a boot from the GUI. We see that the P-cores are busy, not the E-cores.

More interesting, the guests which always failed to boot (such as Fedora and its "CPU soft lockup" errors) now boot normally. I understand the performance improvement when moving to P-cores. But I do not understand the change of behavior. The E-cores are certainly slower than the P-cores, but they are still as performant, or even more performant then older laptops on which VBox works correctly. I have been using VBox for 10 or 15 years, as well as other hypervisors (VMware, Parallels, KVM/Qemu, UTM) and the first CPU's I used with VBox were certainly much slower than my current E-cores. And it worked.

So, my problem is fixed. Thank you all for the help. But the lack of rational explanation for the hangs of some distros during the boot when running on E-cores still worries me.
Last edited by lelegard on 13. Feb 2024, 16:29, edited 1 time in total.
scottgus1
Site Moderator
Posts: 20945
Joined: 30. Dec 2009, 20:14
Primary OS: MS Windows 10
VBox Version: PUEL
Guest OSses: Windows, Linux

Re: CLI startvm headless hangs with Fedora and Redhat guests

Post by scottgus1 »

lelegard wrote: 13. Feb 2024, 14:38 How to check that Hyper-V is no longer active, really? Outside VBox logs, I mean.
If there are some remains of Hyper-V which prevent VBox from using the HW virtualization features, how to clean them?
Here's our tutorial on Hyper-V: HMR3Init: Attempting fall back to NEM (Hyper-V is active) You'll note that there's a lot of things that use it now, and the list keeps growing. For Virtualbox purposes, the log is the Word of God on whether Hyper-V-requiring services are still running, and the tutorial will show why Windows Features isn't the end-all-be-all of showing that Hyper-V is not enabled. That list's "Hyper-V" isn't all of Hyper-V.

Virtualbox is working on getting decent performance with active Hyper-V, and it's been getting better. You may not have to disable it.

Glad you got the P vs E cores sorted!
lelegard
Posts: 15
Joined: 14. Sep 2018, 11:49
Primary OS: MS Windows 10
VBox Version: PUEL
Guest OSses: Linux, BSD, Windows
Location: Paris, France

Re: CLI startvm headless hangs with Fedora and Redhat guests

Post by lelegard »

Thank you. I now got rid of Hyper-V and let VBox manage VT-x.

Procedure:

Launch the "System Information" application from start menu. Click on "System Summary". On the right pane, see "Virtualization-based security". In my case, it is "Running". So, this indicates that Hyper-V is still active, despite having unchecked it.

System Settings -> Core Isolation
=> Disable "Memory Integrity" (it was "on" in my case)

System Settings -> Turn Windows features on or off
=> Disable / uncheck "Hyper-V", "Windows Hypervisor Platform", "Virtual Machine Platform", "Windows Sandbox" (they were all already unchecked in my case).

After reboot, "Virtualization-based security" is now marked "Not enabled". Starting a VBox VM, we no longer see the Hyper-V messages. Before reboot, VBox.log had this:

Code: Select all

00:00:00.731145 HM: HMR3Init: Attempting fall back to NEM: VT-x is not available
...
00:00:00.763976 CPUM: No hardware-virtualization capability detected
Now, it has this:

Code: Select all

00:00:00.739796 HM: HMR3Init: VT-x w/ nested paging and unrestricted guest execution hw support
00:00:00.763978 CPUM: fXStateHostMask=0x7; initial: 0x7; host XCR0=0x7
00:00:00.766219 CPUM: Matched host CPU INTEL 0x6/0xba/0x2 Intel_Atom_Unknown with CPU DB entry 'Intel Pentium N3530 2.16GHz' (INTEL 0x6/0x37/0x8 Intel_Atom_Silvermont)
I wonder about the message "CPUM: Matched host CPU INTEL 0x6/0xba/0x2 Intel_Atom_Unknown with CPU DB entry 'Intel Pentium N3530 2.16GHz' (INTEL 0x6/0x37/0x8 Intel_Atom_Silvermont)". This is clearly not the right CPU. But I may misinterpret the message which is bizarrely formed.
scottgus1
Site Moderator
Posts: 20945
Joined: 30. Dec 2009, 20:14
Primary OS: MS Windows 10
VBox Version: PUEL
Guest OSses: Windows, Linux

Re: CLI startvm headless hangs with Fedora and Redhat guests

Post by scottgus1 »

lelegard wrote: 13. Feb 2024, 15:33 I wonder about the message "CPUM: Matched host CPU INTEL 0x6/0xba/0x2 Intel_Atom_Unknown...
I'm not entirely certain what that matching thing is all about, but there's another line:
00:00:01.370295 Full Name: "13th Gen Intel(R) Core(TM) i7-13700H"
which happens later, and shows the actual host CPU finally used. Maybe the first "matching" line is a preliminary startup phase?
lelegard
Posts: 15
Joined: 14. Sep 2018, 11:49
Primary OS: MS Windows 10
VBox Version: PUEL
Guest OSses: Linux, BSD, Windows
Location: Paris, France

Re: CLI startvm headless hangs with Fedora and Redhat guests

Post by lelegard »

scottgus1 wrote: 13. Feb 2024, 16:30 Maybe the first "matching" line is a preliminary startup phase?
Probably. Let's forget about it.

I updated the ticket with the information that the problem occurs only on E-cores. I may have solved my problem but there is still a problem when VBox runs on E-core. Having 4 E-cores of the host busy looping forever while the guest reports a "soft lockup - CPU stuck" looks like a bug. VBox works fine on older machines where the CPU cores are much slower than my E-cores.

Thank you all again.
Post Reply