5

While trying to debug frequent freezes of my new laptop (KabyLake architecture) running Ubuntu 16.04 I've stumbled upon these entries in kern.log:

kernel: [    0.041634] mce: [Hardware Error]: Machine check events logged

Since then I have installed mcelog but do not know what to make of the logs. Content of /var/log/mcelog is:

mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 6 
MISC 3880018086 ADDR fef1cf00 
TIME 1479298799 Wed Nov 16 13:19:59 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 7 
MISC 43880018086 ADDR fef1ff00 
TIME 1479298799 Wed Nov 16 13:19:59 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 6 
MISC 3880018086 ADDR fef1cf00 
TIME 1479321645 Wed Nov 16 19:40:45 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 7 
MISC 43880018086 ADDR fef1ff00 
TIME 1479321645 Wed Nov 16 19:40:45 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 6 
MISC 43880000086 ADDR fef1db80 
TIME 1479328438 Wed Nov 16 21:33:58 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 7 
MISC 13880000086 ADDR fef1dc00 
TIME 1479328438 Wed Nov 16 21:33:58 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 6 
MISC 43880000086 ADDR fef1db80 
TIME 1479333991 Wed Nov 16 23:06:31 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 7 
MISC 13880000086 ADDR fef1dc00 
TIME 1479333991 Wed Nov 16 23:06:31 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 6 
MISC 43880000086 ADDR fef1db80 
TIME 1479373350 Thu Nov 17 10:02:30 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 7 
MISC 13880000086 ADDR fef1dc00 
TIME 1479373350 Thu Nov 17 10:02:30 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 6 
MISC 3880018086 ADDR fef1cf00 
TIME 1479373810 Thu Nov 17 10:10:10 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee0000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 7 
MISC 43880018086 ADDR fef1ff00 
TIME 1479373810 Thu Nov 17 10:10:10 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee0000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 6 
MISC 3880018086 ADDR fef1cf00 
TIME 1479375712 Thu Nov 17 10:41:52 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 7 
MISC 43880018086 ADDR fef1ff00 
TIME 1479375712 Thu Nov 17 10:41:52 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 6 
MISC 3880018086 ADDR fef1cf00 
TIME 1479385932 Thu Nov 17 13:32:12 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 7 
MISC 43880018086 ADDR fef1ff00 
TIME 1479385932 Thu Nov 17 13:32:12 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 6 
MISC 3880018086 ADDR fef1cf00 
TIME 1479387666 Thu Nov 17 14:01:06 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 7 
MISC 43880018086 ADDR fef1ff00 
TIME 1479387666 Thu Nov 17 14:01:06 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 6 
MISC 43880000086 ADDR fef1db80 
TIME 1479456710 Fri Nov 18 09:11:50 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 7 
MISC 13880000086 ADDR fef1dc00 
TIME 1479456710 Fri Nov 18 09:11:50 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 6 
MISC 43880000086 ADDR fef1db80 
TIME 1479459374 Fri Nov 18 09:56:14 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 7 
MISC 13880000086 ADDR fef1dc00 
TIME 1479459374 Fri Nov 18 09:56:14 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142

Some observations (please correct me if any of them are wrong):

  • Almost all errors seem to occur on the same page (ADDR fef1xxx)
  • Only banks 6 and 7 seem to be affected.
  • All entries contain "Error overflow" and "Uncorrected error".

The mcelog FAQ mentions that a "low rate of corrected memory errors is expected and does not require replacing hardware or other action". The log entries contain the phrase "Uncorrected error" which suggests I actually should take some action.

My questions are:

  1. What do these errors mean and should I worry about them?
  2. Could these hardware errors be the cause of the freezes of the entire system?
  3. Should I have the laptop (or parts) replaced by the manufacturer?
  4. Are there any other actions I should take?
justfortherec
  • 153
  • 1
  • 8

2 Answers2

4

First, I fear that I cannot really give good answers to your questions. I also own a Dell XPS 13 (9360) and see the same MCE messages. I'm in contact with Dell Support because of these. They replaced the mainboard but it did not help. Same messages in the logs. At some point they concluded that it is probably a false positive. They had no idea what is causing it, though (mcelog/kernel/Intel problem?). The correspondence with Support is still ongoing.

<rant> Btw, talking to Dell Support is a very unpleasant experience. They seem to only suggest the "standard" solutions like resetting the Firmware, run self-health tests and so on. I didn't had the impression to talk to someone with some technical insight. </rant>

To add more details, I see the same issue on Fedora 24 so it seems not to be related to Ubuntu.

Regarding your questions:

What do these errors mean and should I worry about them?

I don't know. Dell Support thinks those are false positives.

Could these hardware errors be the cause of the freezes of the entire system?

Besides the messages my system works fine. I'd guess the freeze is a different issue.

Should I have the laptop (or parts) replaced by the manufacturer?

Replacing the mainboard did not fix the MCE issue. It might solve the freezing issue, although it seems that this was fixed by a kernel update.

Are there any other actions I should take?

If you are not already in contact with Support, contact them. Maybe they will come up with a real solution once they see that it affects more customers.

Josef Eisl
  • 156
  • 3
  • Thanks a lot for your insights. May I ask what Linux you are running to not experience the freezes? Indeed, updating to a 4.8 kernel fixed the issue for me. Are you running on stock Ubuntu 16.04? I will follow your advice and contact Dell. – justfortherec Dec 14 '16 at 10:43
  • I'm currently on an up-to-date Fedora 24 which comes with a 4.8.10 kernel. I did not use the stock Ubuntu 16.04 long enough to tell if there are problems. Good luck with support! – Josef Eisl Dec 14 '16 at 13:07
  • Another update: Support was able to reproduce it on their test machine. This needs to be fixed upstream. They forwarded the issue internally to some department that will look into it (whatever that means). In addition they suggested to send error reports e.g. to Ubuntu. – Josef Eisl Dec 15 '16 at 15:56
  • Not that you need another "me too" but I have a new XPS 9360 and just installed Fedora 25 and get the same MCE errors. They always seem to happen a couple minutes after boot, then I'm fine (and nothing is broken, just annoying Oops messages) – Adam Batkin Jan 26 '17 at 18:25
  • Same hardware (XPS 9360) and same MCE errors. I'm running Debian sid. – Kan-Ru Chen Mar 10 '17 at 04:06
  • I too have this issue. Dell Precision 5520. Fedora 25, Kernel 4.10.8 – Scott Apr 05 '17 at 00:20
  • @Scott is that also a KabyLake? – Josef Eisl Apr 06 '17 at 12:13
  • @JosefEisl yes. CPU family: 6 Model: 158 Model name: Intel(R) Core(TM) i5-7440HQ CPU @ 2.80GHz – Scott Apr 12 '17 at 02:34
  • Hate to say me too, but before I realised what MCE meant, I asked the same question on AskUbuntu, raised a dell support request, ran all hardware check tests (DellSupportCenterl and pre-boot test) all of which passed, and Dell told me that it was a 'driver' issue that occurred only when you dual-boot and apparently they have already raised it and Ubuntu Devs/ Intel are working on it (couldn't get a link to the issue report). So, for now, I can either remove Windows completely or live with it was their suggestion. – NikhilWanpal May 14 '17 at 06:39
  • @NikhilWanpal I don't have a dual-boot setup. – Josef Eisl Jan 03 '18 at 09:17
  • Update: the problem disappeared quite some time ago. I guess it was fixed by a kernel update but I have not bisected them nor did I inspect the change logs for hints. Also, I've only checked Fedora. Long story short, everything is fine now :) – Josef Eisl Jan 03 '18 at 09:23
  • @JosefEisl ha! this was quite an old comment. In my case the issue was resolved by a subsequent BIOS update dell released for the laptop. I installed it while battling a different issue, related to sound card. but at least this is no longer a concern. – NikhilWanpal Jan 03 '18 at 13:01
  • @NikhilWanpal glad to here that! – Josef Eisl Jan 04 '18 at 09:46
1

enter image description here

I got the same mce errors, started popping up on boot on the last few kernel updates (Fedora 25), but I lost the track on which exact update this started appearing. The notebook is DELL Inspiron 5567 (Intel i5 7200U). However the system works perfectly fine after the boot, so I'm 100% sure this is fake positives appearing for some reason.

Mr.Torture
  • 11
  • 2