Machine check - hardware errors - Intel 4790K

Discussions related to using VirtualBox on Linux hosts.
mcsquared
Posts: 8
Joined: 21. Jul 2011, 13:22
Primary OS: Fedora other
VBox Version: OSE Fedora
Guest OSses: Windows 2000

Machine check - hardware errors - Intel 4790K

Post by mcsquared »

Running Scientific Linux 7.2

I have run Virtual Box for several years (to run windows 2000 to support a system I use), previously on Intel E6500 CPU (Wolfdale family based hardware on Gigabyte EP43-UD3L)).

I have now set up a new linux system based on Intel 4790K CPU (Haswell family based hardware on Gigabyte GA-Z97X-UD5H ).

I now get for example

The kernel log indicates that hardware errors were detected.
:System log may have more information.
:The last 20 mcelog lines of system log are:
:==========================================
:Dec 21 17:59:13 fax mcelog: MCG status:
:Dec 21 17:59:13 fax mcelog: MCi status:
:Dec 21 17:59:13 fax mcelog: Corrected error
:Dec 21 17:59:13 fax mcelog: Error enabled
:Dec 21 17:59:13 fax mcelog: MCA: Internal parity error
:Dec 21 17:59:13 fax mcelog: STATUS 90000040000f0005 MCGSTATUS 0
:Dec 21 17:59:13 fax mcelog: MCGCAP c09 APICID 6 SOCKETID 0
:Dec 21 17:59:13 fax mcelog: CPUID Vendor Intel Family 6 Model 60
:Dec 22 09:42:37 fax mcelog: Hardware event. This is not a software error.
:Dec 22 09:42:37 fax mcelog: MCE 0
:Dec 22 09:42:37 fax mcelog: CPU 0 BANK 0
:Dec 22 09:42:37 fax mcelog: TIME 1450777357 Tue Dec 22 09:42:37 2015
:Dec 22 09:42:37 fax mcelog: MCG status:
:Dec 22 09:42:37 fax mcelog: MCi status:
:Dec 22 09:42:37 fax mcelog: Corrected error
:Dec 22 09:42:37 fax mcelog: Error enabled
:Dec 22 09:42:37 fax mcelog: MCA: Internal parity error
:Dec 22 09:42:37 fax mcelog: STATUS 90000040000f0005 MCGSTATUS 0
:Dec 22 09:42:37 fax mcelog: MCGCAP c09 APICItD 0 SOCKETID 0
:Dec 22 09:42:37 fax mcelog: CPUID Vendor Intel Family 6 Model 60

and

Jan 21 13:35:50 fax.whealvor.co.uk mcelog[988]: MCG status:
Jan 21 13:35:50 fax.whealvor.co.uk mcelog[988]: MCi status:
Jan 21 13:35:50 fax.whealvor.co.uk mcelog[988]: Corrected error
Jan 21 13:35:50 fax.whealvor.co.uk mcelog[988:: Error enabled
Jan 21 13:35:50 fax.whealvor.co.uk mcelog[988]:MCA: Internal parity error
Jan 21 13:35:50 fax.whealvor.co.uk mcelog[988]: STATUS 90000040000f0005 MCGSTATUS 0t
Jan 21 13:35:50 fax.whealvor.co.uk mcelog[988]: MCGCAP c09 APICID 6 SOCKETID 0
Jan 21 13:35:50 fax.whealvor.co.uk mcelog[988]: CPUID Vendor Intel Family 6 Model 60
Jan 21 13:35:50 fax.whealvor.co.uk mcelog[988]: Hardware event. This is not a software error.
Jan 21 13:35:50 fax.whealvor.co.uk mcelog[988]: MCE 0
Jan 21 13:35:50 fax.whealvor.co.uk mcelog[988]: CPU 1 BANK 0
Jan 21 13:35:50 fax.whealvor.co.uk mcelog[988]: TIME 1453383350 Thu Jan 21 13:35:50 2016
Jan 21 13:35:50 fax.whealvor.co.uk mcelog[988]: MCG status:
Jan 21 13:35:50 fax.whealvor.co.uk mcelog[988]: MCi status:
Jan 21 13:35:50 fax.whealvor.co.uk mcelog[988]: Corrected error
Jan 21 13:35:50 fax.whealvor.co.uk mcelog[988]: Error enabled
Jan 21 13:35:50 fax.whealvor.co.uk mcelog[988]: MCA: Internal parity error
Jan 21 13:35:50 fax.whealvor.co.uk mcelog[988]: STATUS 90000040000f0005 MCGSTATUS 0
Jan 21 13:35:50 fax.whealvor.co.uk mcelog[988]: MCGCAP c09 APICID 2 SOCKETID 0
Jan 21 13:35:50 fax.whealvor.co.uk mcelog[988]: CPUID Vendor Intel Family 6 Model 60



Although they are reported as hardware errors, and they are not fatal, they are annoying, and can upset a backup, causing an rsync error, if the "hardware error" occurs, whilst backing up.

These errors only occur when a Virtual Machine is running under VirtuialBox. This happens irrespective of which Virtual Machine is running. These Virtual Machines were created on the previous Intel systems and imported to the new 4790KL system via an appliance.

I disabled ACPI in these Virtual Machines but the problem persists.

I cannot relate the frequency of the errors with activity on the Virtual Machines as I have on days when the machine is just in idle mode, that I can have more errors, than when the machine is in active use.

What I have noticed is that since updating Virtual Box to VirtualBox-5.0-5.0.12_104815_el7-1.x86_64. the frequency of machine checks has doubled from 12 to 30 plus a day. The machine dumps rapidly fill up /var/log, which means pruning on a regular basis

I have updated the BIOS on the motherboard to the latest stablel version for Gigabyte GA-Z97X-UD5H

I have browsed the web and I am aware of problems with the Haswell chipset and Virtual machines. I have contacted Intel, who have replaced the CPU, but the problem still exists. I have suggested a microcode (BIOS) fix, but this has met with silence.

Any ideas as to how to overcome this problem ...

Meriel
Perryg
Site Moderator
Posts: 34369
Joined: 6. Sep 2008, 22:55
Primary OS: Linux other
VBox Version: OSE self-compiled
Guest OSses: *NIX

Re: Machine check - hardware errors - Intel 4790K

Post by Perryg »

I have suggested a microcode (BIOS) fix, but this has met with silence.
To whom did you suggest the "(BIOS) fix"?
michaln
Oracle Corporation
Posts: 2973
Joined: 19. Dec 2007, 15:45
Primary OS: MS Windows 7
VBox Version: VirtualBox+Oracle ExtPack
Guest OSses: Any and all
Contact:

Re: Machine check - hardware errors - Intel 4790K

Post by michaln »

"Hardware event. This is not a software error." That's about all that's relevant for this forum I'm afraid... We can't fix it. Your hardware is unfortunately buggy, which seems to be the norm for new Intel releases nowadays :(

This may be relevant: https://bugs.launchpad.net/qemu/+bug/1307225
mcsquared
Posts: 8
Joined: 21. Jul 2011, 13:22
Primary OS: Fedora other
VBox Version: OSE Fedora
Guest OSses: Windows 2000

Re: Machine check - hardware errors - Intel 4790K

Post by mcsquared »

Thanks for your replies

Firstly Intel suggested the latest bios but that did not make any difference.
If they were to make a microcode alteration then this would have been reflected in the BIOS, but Intel were not interested in fixing the "bug"

Secondly, thank you for the link ... Yes I had followed that link previously, which is why I was persistent with Intel that it was their problem.. Intel replaced the two CPU's 4790K on my two machines, but that did not make any difference.. I am not inclined to add parameters which just tells the system to ignore machine checks as this does not solve the problem and could mask any other machine problems should they occur.

The reason for posting here now, was that on upgrading VirtulaBox to version VirtualBox-5.0-5.0.12_104815_el7-1.x86_64, the number of hardware errors in 24 hours more than doubled to typically 50 a day ... hence I wondered if this might give a clue to someone a more knowledgeable about virtual box and virtual machines as to what is causing the problem.

Thanks again for your comments ... any further suggestions most welcome

M
michaln
Oracle Corporation
Posts: 2973
Joined: 19. Dec 2007, 15:45
Primary OS: MS Windows 7
VBox Version: VirtualBox+Oracle ExtPack
Guest OSses: Any and all
Contact:

Re: Machine check - hardware errors - Intel 4790K

Post by michaln »

We are not aware of any changes that would have an impact on this. Then again, the changed behavior could simply be a result of indirect changes like code/data shifting around a little, being aligned differently, sitting on different cache lines, or who knows what else.

The problem with microcode updates in the BIOS is that the board/BIOS vendor has to actually include them. I don't know how that can be easily checked.

I take it that you have two more or less identical systems with the same boards and CPUs? Have you tried a different board? That might be a problem too, not just the CPU.
Perryg
Site Moderator
Posts: 34369
Joined: 6. Sep 2008, 22:55
Primary OS: Linux other
VBox Version: OSE self-compiled
Guest OSses: *NIX

Re: Machine check - hardware errors - Intel 4790K

Post by Perryg »

I believe ( like michaln ) that you will find that the real issue is the Gigabyte MB and the fact that they do not care about Linux at all. Most of the "low end" MB manufactures only care about Windows, their largest crowd. I had/have some microcode issues on an MSI MB and have had them with Gigabyte as well which I was forced to work around. They simply refuse to update their BIOS to take care of issues like you are seeing.
mcsquared
Posts: 8
Joined: 21. Jul 2011, 13:22
Primary OS: Fedora other
VBox Version: OSE Fedora
Guest OSses: Windows 2000

Re: Machine check - hardware errors - Intel 4790K - solved

Post by mcsquared »

Hello Guys

At last .. I have found a way to stop the Kernel oops/machine check/hardware errors I have been experiencing when running a virtual machine under Virtual Box

This was getting increasingly frustrating, particularly after adding more memory the number of errors per day increased to over 300.

I decided to experiment with different settings in the BIOS.

I disabled Intel Virtualization Technology and Intel Virtualization Technology for Directed I/O, both of which were enabled by default

This appears to have stopped the error reports .. I am not aware of any adverse affects of disabling these two settings.

M
mcsquared
Posts: 8
Joined: 21. Jul 2011, 13:22
Primary OS: Fedora other
VBox Version: OSE Fedora
Guest OSses: Windows 2000

Re: Machine check - hardware errors - Intel 4790K

Post by mcsquared »

Following on from my previous post, below is the response from Intel

>Thank you for contacting Intel® Customer Support.
>I would like to inform you that I have received the response from our expert technician:
>We believe that the board manufacturer offered a solution for the issue, however, please also note that Intel guarantees full functionality of its CPU's when used:
>- At stock speed (without overclocking)
>- Outside a virtual configuration, without running a virtual machine

.As you indicated that the issue is not occurring when disabling settings for virtual computing in the BIOS, we can consider that the CPU is working as expected.

>Regarding the issue in the following link: https://bugs.launchpad.net/qemu/+bug/1307225 we have not received any similar reports from other customers.
>The web link does not provide specific proof that the issue is related to the CPU. The report does not specify exact details on hardware, software and OS involved, therefore we are unable to investigate the >issue.

>However we can offer the following suggestions:
>- check the communities for the Operating System for advice;
>- Report the issue to the board manufacturer once again, and follow their advice.
>- Test one of the affected systems with a validated windows version with updated drivers.
>(In that case please send us the OS event logs for further investigation.)
>- Test with another Linux Distro, however this is really only a suggestion, we would be unable to provide any further assistance or advise, we cannot guarantee a positive outcome nor can we advice which >distro to test.
>This test might however indicate that the issue is related to the current OS.
>- Cross test all hardware mainly board, memory and power supply
>- Keep the settings for Virtualization technology disabled in the BIOS, when you do not require a virtual PC configuration then this should not be a problem.

So it looks like if you want to run a virtual machine, (Virtual Box, VMWare) on a system which has the capacity to use Intel's virtualisation then make sure you disable CPU virtualisation in the BIOS
michaln
Oracle Corporation
Posts: 2973
Joined: 19. Dec 2007, 15:45
Primary OS: MS Windows 7
VBox Version: VirtualBox+Oracle ExtPack
Guest OSses: Any and all
Contact:

Re: Machine check - hardware errors - Intel 4790K

Post by michaln »

mcsquared wrote:As you indicated that the issue is not occurring when disabling settings for virtual computing in the BIOS, we can consider that the CPU is working as expected.
That's nonsense. Is Intel saying that VT-x is an unsupported CPU feature? Machine checks already mean the hardware is not working as expected.
So it looks like if you want to run a virtual machine, (Virtual Box, VMWare) on a system which has the capacity to use Intel's virtualisation then make sure you disable CPU virtualisation in the BIOS
With VirtualBox, that will make it impossible to use guest SMP or run 64-bit guests. It will also massively reduce the virtualization performance.

In a normal world, the obvious solution would be to buy a CPU that isn't broken from a different vendor... in a monopoly world that doesn't work so well.
mcsquared
Posts: 8
Joined: 21. Jul 2011, 13:22
Primary OS: Fedora other
VBox Version: OSE Fedora
Guest OSses: Windows 2000

Re: Machine check - hardware errors - Intel 4790K

Post by mcsquared »

Well I responded to Intel with the previous poster's comments and this is what I had in reply

Thank you for contacting Intel® Customer Support.
We believe that the issue can be related to the CPU but not necessarily, also the software used to create a virtual machine may play a role in causing the problem.
At this stage we would like to investigate the issue further by eliminating everything that could cause the problem and verify if the CPU (and all other components) work without using a virtual machine but not within the virtual machine.
Therefore we suggest testing the system with a tested Operating System and Virtual box and inform us on the outcome.
The tested operating systems for the board GA-Z97X-UD5H-BK are Windows 10/8.1/8/7:
http://www.gigabyte.com/products/produc ... id=4978#sp
For further queries please do not hesitate to contact us.

I shall be responding that I do not run Windows and do not have access to Windows operating systems.
Interesting Virtual Box on a socket 775 processor with Scientific Linux 6 had no problems
It was only when I upgraded to a Haswell based system ( CPU/motherboard) that I had this problem.
Perryg
Site Moderator
Posts: 34369
Joined: 6. Sep 2008, 22:55
Primary OS: Linux other
VBox Version: OSE self-compiled
Guest OSses: *NIX

Re: Machine check - hardware errors - Intel 4790K

Post by Perryg »

Figures. gigabyte only supports Windows which I had already found out. You can search for issues like this and there are plenty. I switched to MSI higher end and still have BIOS issues because they too will not accept anything other than Windows for their test support, but the issues I have now are not as bad. I have an i7-5820K Intel processor and 32GB DDR4 quad channel memory. MSI is a little better at updating their BIOS but it takes an act of congress to get it done which is the best way to adjust the microcode IMHO. Intel will not be able to help you a lot though. If anyone can it will be the MB manufacture because they will need to get involved with the BIOS folks and guess what? They will be playing ping pong back and forth trying to figure out who needs to do the work. Can you say catch 22?
mcsquared
Posts: 8
Joined: 21. Jul 2011, 13:22
Primary OS: Fedora other
VBox Version: OSE Fedora
Guest OSses: Windows 2000

Re: Machine check - hardware errors - Intel 4790K

Post by mcsquared »

My response to Intel was that Linux was a robust operating system

>I should like to point out that Linux, is a robust operating system, often giving detailed information regarding the operation of the components, both hardware and software.
>
>The error messages, (Internal parity error), seen in the system log are hardware errors, pointing to a problem with the cache (Bank 0) on the CPU, as all 4 CPU's are mentioned in various error messages.
>(ie CPU 0 BANK 0; CPU 1 BANK 0; CPU 2 BANK 0; and CPU 3 BANK 0)
>
> It is interesting that the frequency of the errors markedly increased (from 30 to appx 300 a day)when I added more memory, (16GByte to 32 GByte). Hope this helps tie down the problem.

To which Intel responded advising they would escalate the problem to higher level support

below is the response from high level support

> Thank you for contacting Intel® Customer Support.
> We have received a response from our highest level of support .
> Our investigation does not single out the CPU or its microcode as the root cause of the problem; it is highly unlikely that the issue is caused by the CPU.
> We have seen several posts on the web where the same issue is described. The issue was related to the VM (QEMU in most cases, which is a different hypervisor).
> Therefore we suggest you to contact Oracle at this stage, and/or to test different hypervisor software.

Everyone is blaming each other ..
michaln
Oracle Corporation
Posts: 2973
Joined: 19. Dec 2007, 15:45
Primary OS: MS Windows 7
VBox Version: VirtualBox+Oracle ExtPack
Guest OSses: Any and all
Contact:

Re: Machine check - hardware errors - Intel 4790K

Post by michaln »

It would be lovely if Intel explained how software (even buggy) can trigger hardware errors, which is what machine checks are. We are not aware of any such mechanism.

Of course it would be even better if they could explain why a given VirtualBox version only triggers machine checks on very specific CPU models.
mcsquared
Posts: 8
Joined: 21. Jul 2011, 13:22
Primary OS: Fedora other
VBox Version: OSE Fedora
Guest OSses: Windows 2000

Re: Machine check - hardware errors - Intel 4790K

Post by mcsquared »

Hello

I received the following reply from Intel

> This machine errors are related to QEMU hypervisor which is the one which actually Virtual Box uses for most of its hardware virtualization. Please could you try a different hypervisor?
> Since the error seems to be related to a particular hypervisor, could you use a different program than Virtual box and make sure that program doesn’t use the QEMU hypervisor?

They are blaming Virtual Box.. this does not explain, why my previous systems socket 775 with a Gigabyte motherboard (with Intel Virtualization enabled by default) did not give any problems

The question is, do other systems not using a Haswell CPU, experience hardware check errors, when running a 32bit virtual machine under Virtual Box.
michaln
Oracle Corporation
Posts: 2973
Joined: 19. Dec 2007, 15:45
Primary OS: MS Windows 7
VBox Version: VirtualBox+Oracle ExtPack
Guest OSses: Any and all
Contact:

Re: Machine check - hardware errors - Intel 4790K

Post by michaln »

Can they explain how software can trigger hardware errors?

And qemu is not even a hypervisor (it's an emulator), they probably mean kvm. Which VirtualBox definitely isn't.
Post Reply