I have an ext4 partition with the following underlaying stack:
- sda1 and sdb1 are together in a RAID1, resulting in md0
- md0 is LUKS-encrypted, resulting in md0_crypt
- on top of md0_crypt is a single LVM volume mv0_vg_media, mounted under /home/media
When performing cp /home/media/hierarchy/photo.jpg /tmp I get an IO error (but just for like 20 files, out of several ten thousands).
Hoever, when I attempt to debug the problem:
- Both dmesg and the syslog stay clean when the IO Error occurs. (EDIT: clarification: This means that there is not any disk-related output logged in dmesg or syslog, even if the log is active with
--followoption while the read errors occur.) - Badblocks on sda and sdb do not reveal any errors
- fsck on
/dev/mapper/md0_vg_mediaonly outputs "could be narrower. IGNORED." warnings but no errors and the autocorrect option does not fix anything.
I'm puzzled. I could just delete those files and re-sync them, but that's a bad idea if I don't know what the problem actually is.
How can I further debug this?
EDIT:
From the comments and further research, I tried the following approaches (in vain) with the following results:
mdadm --examine /dev/sda(same for sdb) returns:/dev/sda: MBR Magic : aa55 Partition[0] : 3907029167 sectors at 1 (type ee)mdadm --examine-badblocks /dev/sda(same for sdb) returns:mdadm: mbr metadata does not support badblockscat /sys/block/md0/md/mismatch_cntcontains the value0(zero)echo 'check' > /sys/block/md0/md/sync_actionperforms the check but does not reveal any errors. Dmesg has two entries:[734796.807172] md: data-check of RAID array md0and then immediately below:[754370.977181] md: md0: data-check done.