Questions tagged [ecc]

Abbreviation for Error-Correcting Code.

ECC is an algorithm which is used to detect and correct (if possible) errors in data communication and storage.

It's a kind of redundant data handling, where the ECC code (ECC byte) is calculated from the valuable data which is protected by this code.

Detection

With proper implementation the system can detect if the collected data is corrupted or not. It's based on the re-calculation of the ECC code. If the new value equals to the code attached to the data, the system should consider that the data is intact. If not, the data is corrupted.

Error

Error correction is possible only if the Hamming-distance of valid ECC values is greater than 1. The count of correctible errors in a system is: FLOOR( d / 2, 0 ), where d is the count of valid ECC values.

In most cases ECC is used to detect errors only.

See: http://en.wikipedia.org/wiki/Error-correcting_code http://en.wikipedia.org/wiki/Hamming_code

17 questions
30
votes
3 answers

How to tell whether RAM ECC is working?

I'm planning on getting some ECC RAM to replace the non-ECC RAM I currently have installed on my Asus M5A97 Pro motherboard (AMD 970 chipset, FX-6100 CPU). After I install the RAM, how do I tell whether the ECC feature of the RAM is working…
user
  • 28,161
  • 13
  • 75
  • 138
24
votes
3 answers

Is it possible to find the physical address range of a DIMM?

I note that SMBios Type 20 would help here, but it's optional as of version 2.5 (2006-09-05) pp. 25, L796, and pp. 131, whereas types 16, 17 and 19 are mandatory, but don't quite help. Physical Memory Array (Type 16) There is one of these structures…
Alun
  • 409
  • 1
  • 4
  • 7
19
votes
5 answers

Is it possible to add error correction codes (BCH, RS or etc.) to a single file?

As far as I know, WinRAR archives may contain ECC (error correction codes), so if the archive is slightly damaged, then it can be fixed by itself. For example, I can first encode archives.tar to archives.tar.ecc, and then upload it to my server. If…
Kevin Dong
  • 1,139
  • 1
  • 9
  • 18
9
votes
1 answer

How do I enable and verify ECC RAM scrubbing in Linux?

I bought my first system with ECC RAM and trying to learn about its possibilities when it comes to alerting and maintenance in Linux. To be specific, Debian Linux on a Super Micro H8SGL motherboard with an AMD Opteron 6386 SE CPU and Samsung…
pipe
  • 921
  • 10
  • 25
9
votes
3 answers

How to get error detection and correction on a single hard drive on linux (with btrfs or other methods)

One of the cool things about btrfs on linux is that it can correct bit rot if it has redundant data because of its per-block checksumming. I can get redundant data by setting up a raid1 with two disks. However, can I also get redundant data to…
lnmaurer
  • 243
  • 2
  • 10
8
votes
1 answer

"Northbridge Error (node 0): ECC Error in the Probe Filter directory"

I've received an e-mail from a user worried that the following errors on one of his servers is indicative of a serious problem. The trouble is, the errors below are all that I have to go on. I usually consider myself a decent Googler, but in this…
CptSupermrkt
  • 1,492
  • 5
  • 16
  • 26
5
votes
1 answer

Remove ECC warnings in system log

How can I disable these warnings about ECC? I don't have ECC memory and so disabled it in bios also but it still prints it. [ 4.697057] EDAC amd64: Node 0: DRAM ECC disabled. [ 4.697061] EDAC amd64: ECC disabled in the BIOS or no ECC…
JoKeR
  • 440
  • 1
  • 8
  • 20
4
votes
0 answers

How to check flash memory for ECC errors from the Linux command line

Is there a way to check flash memory for ECC errors from the Linux command line? Note that I do not want to correct ECC errors. I just want to detect errors and list page addresses where they occur.
Stephen305
  • 173
  • 5
4
votes
2 answers

Understanding "Hardware error from APEI Generic Hardware Error Source" error message

Summary: I'm trying to understand exactly what the following error message means: [17016.923750] {4}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1 [17016.923758] {4}[Hardware Error]: It has been corrected by h/w and…
Gabriel Southern
  • 803
  • 4
  • 9
  • 13
3
votes
0 answers

Mapping around ecc errors in Linux does not seem to work?

I get the following ecc error on a Linux box several times a day - May 24 18:21:04 staton-nas kernel: mce: [Hardware Error]: Machine check events logged May 24 18:21:04 staton-nas kernel: EDAC sbridge MC0: HANDLING MCE MEMORY ERROR May 24 18:21:04…
statop
  • 31
  • 1
2
votes
0 answers

Is there a filesystem that can maintain extra ECC data like raid5, but in the filesystem to make a fault-tolerant single external drive?

Normally to make a fault-tolerant or corruption-repairing filesystem, you use multiple drives and raid 5, or anything but raid 0. There are also many ways to make a fault-tolerant archive file like dar etc. What I am looking for is a way to make a…
Brian White
  • 121
  • 3
2
votes
0 answers

Identify ram module linked to ECC error di DMESG

one of my server is logging the following ECC errors: [lun set 14 00:14:16 2020] {33}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1 [lun set 14 00:14:16 2020] {33}[Hardware Error]: It has been corrected by h/w…
sKo
  • 21
  • 1
2
votes
0 answers

software-level error detection and correction for raw storage

If I understand data storage correctly, all storage devices are unreliable to some extent, which is why most have hardware-level abstraction layers. Hard drives use error correction. If a sector is read and ECC detects an error (whether it was from…
enigmaticPhysicist
  • 1,333
  • 1
  • 11
  • 17
1
vote
2 answers

ECC on a single block device

I have an SSD that I suspect failing silently now and then. I have run badblocks on it and it is clear that it is not bad sectors but might instead be some race condition in the electronics, in which case a retry would probably read the data…
Ole Tange
  • 33,591
  • 31
  • 102
  • 198
1
vote
1 answer

Hardware error from APEI Generic Hardware Error Source (ECC RAM)

[58306.633900] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1 [58306.633905] {1}[Hardware Error]: It has been corrected by h/w and requires no further action [58306.633907] {1}[Hardware Error]: event severity:…
Vlastimil Burián
  • 27,586
  • 56
  • 179
  • 309
1
2