Ubuntu 20.04 host, Win 10 guest constant reboot (STATUS_DATATYPE_MISALIGNMENT)

Discussions related to using VirtualBox on Linux hosts.
fantic10
Posts: 11
Joined: 21. Feb 2021, 23:00

Re: Ubuntu 20.04 host, Win 10 guest constant reboot (STATUS_DATATYPE_MISALIGNMENT)

Post by fantic10 »

I've tested VBox build 6.1.23 r145697 without disable split lock detection in the kernel and it works for me too!
three hours without abnormal reboot
thanks
sxc731
Posts: 9
Joined: 27. Mar 2021, 18:45

Re: Ubuntu 20.04 host, Win 10 guest constant reboot (STATUS_DATATYPE_MISALIGNMENT)

Post by sxc731 »

bird wrote:For good performance etc, it's currently recommended to disable split lock detection in the kernel.
I'm not sure I fully understand this statement. I thought the purpose of the fix was to be able to run VB *with* split lock detection? It would be good to get a little more context if possible?

Cheers!
klaus
Oracle Corporation
Posts: 1110
Joined: 10. May 2007, 14:57

Re: Ubuntu 20.04 host, Win 10 guest constant reboot (STATUS_DATATYPE_MISALIGNMENT)

Post by klaus »

Yes, the fix allows to run VMs with enabled split lock detection. However, if the guest triggers an #AC fault due to a split lock then this results in rather expensive emulation, because it will temporarily stop execution of all other CPUs of the VM.

If the split lock count is low this is negligible (in the Windows 10 case the frequency is around 1-2 per hour), but if the guest executes some badly written code (remember, this can be triggered by userland code) then the VM can become extremely slow.

Hope it's clear now why we still recommend disabling the feature in the context of VMs.

My personal opinion is that the split lock detection code in Linux is useful, but primarily for developers who can fix issues with misaligned accesses. End users will see mostly the negative results without having much chance to make use of it.
sxc731
Posts: 9
Joined: 27. Mar 2021, 18:45

Re: Ubuntu 20.04 host, Win 10 guest constant reboot (STATUS_DATATYPE_MISALIGNMENT)

Post by sxc731 »

Hi @Klaus and thank you very much for your kind explanation!

Having read the documentation and some useful explanations I have one final question: does setting `split_lock_detect` to `warn` still result in #AC exceptions being thrown (and presumably now handled by VB, albeit in a performance-affecting manner), because the VB hypervisor itself runs non-userland code?

Referring to the above explanation, I suppose that the expensive global bus lock mentioned would still occur in the presence of split locks (regardless of the value of `split_lock_detect`) as otherwise the kernel/hardware would potentially allow atomic sections to be violated, thereby possibly resulting in data corruption.

Many thanks!
fth0
Volunteer
Posts: 5668
Joined: 14. Feb 2019, 03:06
Primary OS: Mac OS X other
VBox Version: PUEL
Guest OSses: Linux, Windows 10, ...
Location: Germany

Re: Ubuntu 20.04 host, Win 10 guest constant reboot (STATUS_DATATYPE_MISALIGNMENT)

Post by fth0 »

sxc731 wrote:Having read the documentation and some useful explanations I have one final question: does setting `split_lock_detect` to `warn` still result in #AC exceptions being thrown (and presumably now handled by VB, albeit in a performance-affecting manner), because the VB hypervisor itself runs non-userland code?

Referring to the above explanation, I suppose that the expensive global bus lock mentioned would still occur in the presence of split locks (regardless of the value of `split_lock_detect`) as otherwise the kernel/hardware would potentially allow atomic sections to be violated, thereby possibly resulting in data corruption.
Let me try and explain what happened (klaus might correct me later in the case I get some finer details wrong ;)):

The Linux host OS enabled split-lock detection, and the Windows guest OS did neither enable split-lock detection nor could detect that it was enabled. When the Windows guest OS wanted to execute an instruction with an unaligned memory access crossing a cache line boundary, the CPU triggered an #AC fault, which led to a VM-exit. VirtualBox recognized this only as an ordinary #AC fault, inserted it back to the Windows guest OS, and the #AC fault handler in the Windows guest OS triggered a Windows BSOD event, which ultimately led to a reboot of the Windows guest OS (the VM).

With the bugfix, VirtualBox will recognize that the #AC fault was triggered by a split-lock detection, emulate the faulting instruction itself and let the Windows guest OS continue with the next instruction, thereby preventing the #AC fault handling inside the Windows guest OS. Additionally, VirtualBox has to halt all vCPUs to guarantee the necessary atomicity of the unaligned memory access crossing a cache line boundary.

Re your questions: #AC fault exceptions are still being thrown, but handled by VirtualBox. Since the faulting instruction is not executed, the global bus lock is not used. Depending on how emulating the faulting instruction works, I could be wrong, though.
klaus
Oracle Corporation
Posts: 1110
Joined: 10. May 2007, 14:57

Re: Ubuntu 20.04 host, Win 10 guest constant reboot (STATUS_DATATYPE_MISALIGNMENT)

Post by klaus »

Pretty much correct... I guess the "global bus lock" is how sxc731 expressed what I put as "temporarily stop execution of all other CPUs of the VM". It is an expensive measure for guaranteeing atomic accesses (preventing any possibility of concurrent accesses).

And yes, what matters is whether the Linux kernel sets up split lock detection (which BTW is an inaccurate term, it doesn't apply just for locked bus accesses, it generally happens for any atomic operation). Whether it is for "warn" or "fatal" doesn't matter, because both set up the machinery. And it's irrelevant whether "the VB hypervisor itself runs non-userland code" - any code (it could be the kernel in a VM or a random application in a VM) can trigger split lock detection once it is set up.
fth0
Volunteer
Posts: 5668
Joined: 14. Feb 2019, 03:06
Primary OS: Mac OS X other
VBox Version: PUEL
Guest OSses: Linux, Windows 10, ...
Location: Germany

Re: Ubuntu 20.04 host, Win 10 guest constant reboot (STATUS_DATATYPE_MISALIGNMENT)

Post by fth0 »

klaus wrote:I guess the "global bus lock" is how sxc731 expressed what I put as "temporarily stop execution of all other CPUs of the VM".
It's a term used by "Peter Zijlstra (Intel)" to describe what happens when the Split Lock Detection CPU feature is not enabled and the Intel CPU detects an unaligned memory access across a cache line boundary. Original quote:
From: "Peter Zijlstra (Intel)" <peterz@infradead.org>

A split-lock occurs when an atomic instruction operates on data
that spans two cache lines. In order to maintain atomicity the
core takes a global bus lock.

fth0 wrote:Since the faulting instruction is not executed, the global bus lock is not used. Depending on how emulating the faulting instruction works, I could be wrong, though.
@klaus: How does the IEM realize the memory access(es)?
klaus
Oracle Corporation
Posts: 1110
Joined: 10. May 2007, 14:57

Re: Ubuntu 20.04 host, Win 10 guest constant reboot (STATUS_DATATYPE_MISALIGNMENT)

Post by klaus »

IEM has no other option than breaking it down to multiple aligned accesses (anything else would again trigger split lock detection). That's why no other virtual CPUs may be active, otherwise it wouldn't be guaranteed to be atomic.
Post Reply