Does kernel: EDAC MC0: UE page 0x0 point to bad memory, a driver, or something else?

Question

kernel: EDAC MC0: UE page 0x0, offset 0x0, grain 0, row 7, labels ":": i3200 UE

All of a sudden today, our CentOS release 6.4 (Final) system started throwing EDAC errors. I rebooted, and the errors stopped.

I have been searching for answers, but they fall into two camps, memory or a chipset. I would like some advice on where to search further to narrow this down to chipset or memory.

If it's not a production machine, a memtest would help. – schaiba Jul 16 '13 at 18:17 — schaiba, Jul 16 '13 at 18:17

score 10 · Accepted Answer · edited Apr 13 '17 at 12:36

What you're experiencing is an Error Detection and Correction event. Given the error includes this bit: MC0 you're experiencing a memory error. This message is telling you where specifically you're experiencing the error. MC0 means the RAM in the first socket (#0). The rest of that message is telling you specifically within that RAM DIMM the error occurred.

Given you're getting just one, I would continue to monitor it but do nothing for the time being. If it continues then you most likely are experiencing a failing memory module.

You could also try to test it more thoroughly using memtest86+.

This previous question titled: How to blacklist a correct bad RAM sector according to MemTest86+ error imdocation? will show you how to blacklist the memory if you're interested in that as well.

For completeness, note that there are interactions between BIOS bugs and the kernel in this area which may lead to spurious results on i32xx chipsets: https://bugzilla.redhat.com/show_bug.cgi?id=564274 — Adrian Cox, Jul 24 '14 at 09:32

Does kernel: EDAC MC0: UE page 0x0 point to bad memory, a driver, or something else?

1 Answers1

Linked

Related