0

We noticed the server crashes with below errors. Not sure it is related to any defected piece of the hardware or totally not related to

Server detail:Red Hat Enterprise Linux ES release 4 (Nahant Update 6) [root@athena log]# uname -a Linux athena.nsdecatur.local 2.6.9-67.0.7.ELsmp #1 SMP Wed Feb 27 04:47:23 EST 2008 x86_64 x86_64 x86_64 GNU/Linux

messages

Sep 17 15:08:16 athena kernel: EDAC k8 MC0: general bus error: participating processor(local node response), time-out(no timeout) memory transaction type(generic read), mem or i/o(mem access), cache level(generic)
Sep 17 15:08:16 athena kernel: MC0: CE page 0x2c2766, offset 0xb10, grain 8, syndrome 0xac08, row 1, channel 0, label "": k8_edac
Sep 17 15:08:16 athena kernel: MC0: CE - no information available: k8_edac Error Overflow set
Sep 17 15:08:16 athena kernel: EDAC k8 MC0: extended error code: ECC chipkill x4 error
Sep 17 15:08:17 athena su(pam_unix)[19579]: session opened for user oracle by (uid=0)
Sep 17 15:08:17 athena su(pam_unix)[19579]: session closed for user oracle
Sep 17 15:08:17 athena su(pam_unix)[19634]: session opened for user oracle by (uid=0)
Sep 17 15:08:17 athena su(pam_unix)[19634]: session closed for user oracle
Sep 17 15:08:18 athena kernel: EDAC k8 MC0: general bus error: participating processor(local node origin), time-out(no timeout) memory transaction type(generic read), mem or i/o(mem access), cache level(generic)
Sep 17 15:08:18 athena kernel: MC0: CE page 0x39c857, offset 0xd50, grain 8, syndrome 0x1cc8, row 1, channel 0, label "": k8_edac
Sep 17 15:08:18 athena kernel: MC0: CE - no information available: k8_edac Error Overflow set
Sep 17 15:08:18 athena kernel: EDAC k8 MC0: extended error code: ECC chipkill x4 error
Sep 17 15:08:18 athena su(pam_unix)[19715]: session opened for user oracle by (uid=0)
Sep 17 15:08:18 athena su(pam_unix)[19715]: session closed for user oracle
Sep 17 15:08:18 athena su(pam_unix)[19758]: session opened for user oracle by (uid=0)
Sep 17 15:08:19 athena su(pam_unix)[19758]: session closed for user oracle
Sep 17 15:08:20 athena su(pam_unix)[19807]: session opened for user oracle by (uid=0)
Sep 17 15:08:20 athena su(pam_unix)[19807]: session closed for user oracle
Sep 17 15:08:20 athena su(pam_unix)[19850]: session opened for user oracle by (uid=0)
Sep 17 15:08:20 athena su(pam_unix)[19850]: session closed for user oracle
Sep 17 15:08:20 athena kernel: EDAC k8 MC0: general bus error: participating processor(local node origin), time-out(no timeout) memory transaction type(generic read), mem or i/o(mem access), cache level(generic)
Sep 17 15:08:20 athena kernel: MC0: CE page 0x39c857, offset 0xd50, grain 8, syndrome 0x1cc8, row 1, channel 0, label "": k8_edac
Sep 17 15:08:20 athena kernel: EDAC k8 MC0: extended error code: ECC chipkill x4 error
Sep 17 15:08:21 athena su(pam_unix)[19899]: session opened for user oracle by (uid=0)
Sep 17 15:08:21 athena kernel: EDAC k8 MC0: general bus error: participating processor(local node origin), time-out(no timeout) memory transaction type(generic read), mem or i/o(mem access), cache level(generic)
Sep 17 15:23:54 athena syslogd 1.4.1: restart.
Sep 17 15:23:54 athena syslog: syslogd startup succeeded
Sep 17 15:23:54 athena kernel: klogd 1.4.1, log source = /proc/kmsg started.
user1595858
  • 163
  • 2
  • 8

1 Answers1

0

Those errors mean there was an ECC event was detected by your RAM. There was an error with your RAM. Typically you continue to monitor for more of them, which would usually indicate that either your RAM is failing/faulty or the controller for the RAM is failing. It's not unusual to have one or two on occasion.

In either case it's a hardware failure.

Monitoring

If you're interested in monitoring these failures and setting thresholds you might want to take a look at the mcelog package. The setup of triggers and what it does are covered in this U&L question titled: Writing triggers for mcelog.

slm
  • 363,520
  • 117
  • 767
  • 871