What is grinding my HDDs and how do I stop it?

Question

Something is grinding my hard disks all the time (a few KBs every second) and I can't seem to figure out what.

My configuration: 4 spinning platters (/dev/sd[cdef]) assembled into a raid5 array, then bcache set to cache (hopefully) everything (cache_mode = writeback, sequential_cutoff = 0). On top of bcache volume I have set up lvm.

sda & sdb are SSDs. sdc, sdd, sde & sdf are spinning disks, base for mdadm -> bcache -> lvm -> dm-*.

So, this is the output of (second print) of iostat -x -d 30:

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0,00     0,77    0,97    0,77    12,40     6,13    21,38     0,00    0,23    0,00    0,52   0,23   0,04
sdb               0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00    0,00    0,00   0,00   0,00
sdc               0,03     1,60    0,13    4,50     0,67    17,63     7,90     0,05   11,54   15,00   11,44  11,17   5,17
sdd               1,60     0,30    0,43    4,83     8,13    13,77     8,32     0,06   11,27    0,00   12,28  11,04   5,81
sde               1,63     0,00    0,57    4,07     8,80     9,50     7,90     0,05   10,99    0,47   12,46  10,73   4,97
sdf               0,00     1,90    0,00    5,27     0,00    21,90     8,32     0,04    8,53    0,00    8,53   8,35   4,40
md0               0,00     0,00    0,00    0,97     0,00    12,40    25,66     0,00    0,00    0,00    0,00   0,00   0,00
bcache0           0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00    0,00    0,00   0,00   0,00
dm-0              0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00    0,00    0,00   0,00   0,00
dm-1              0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00    0,00    0,00   0,00   0,00
dm-2              0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00    0,00    0,00   0,00   0,00
dm-4              0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00    0,00    0,00   0,00   0,00
dm-5              0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00    0,00    0,00   0,00   0,00
dm-6              0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00    0,00    0,00   0,00   0,00
dm-7              0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00    0,00    0,00   0,00   0,00
dm-9              0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00    0,00    0,00   0,00   0,00

What strikes me odd in this iostat output is that bcache is not touched at all, therefore I assume there's no activity on the logical volumes.

iotop is silent on the subject as well: there's no app reported working on the disks, so it must be some system daemons / services.

md0 volume sees some activity, but how can that be as there's nothing writing on the logical volumes. Seems bcache would be doing some maintenance work or something? But every second ???

Lastly, there's some activity on sdc - sdf which doesn't really match activity on md0. It's also not symmetric across all the disks, so I don't think it's even mdadm based.

Edit: as per meuh's suggestion, here's iosnoop output:

Tracing block I/O. Ctrl-C to end.
COMM         PID    TYPE DEV      BLOCK        BYTES     LATms
md0_raid5    281    FFS  8,80     18446744073709551615 0          0.04
md0_raid5    281    FFS  8,32     18446744073709551615 0          0.11
md0_raid5    281    FFS  8,64     18446744073709551615 0          0.10
md0_raid5    281    FFS  8,48     18446744073709551615 0          0.10
<idle>       0      WS   8,80     16           4096       0.08
kworker/3:1H 276    WS   8,32     16           4096       0.10
kworker/3:1H 276    WS   8,64     16           4096       0.10
kworker/3:1H 276    WS   8,48     16           4096       0.09
<idle>       0      FFS  8,80     18446744073709551615 0          8.45
<idle>       0      FFS  8,64     18446744073709551615 0         17.42
<idle>       0      FFS  8,32     18446744073709551615 0         19.36
<idle>       0      FFS  8,48     18446744073709551615 0         20.68
md0_raid5    281    FFS  8,32     18446744073709551615 0          0.11
md0_raid5    281    FFS  8,80     18446744073709551615 0          0.10
md0_raid5    281    FFS  8,64     18446744073709551615 0          0.13
md0_raid5    281    FFS  8,48     18446744073709551615 0          0.14
<idle>       0      WS   8,80     8            512        0.06
<idle>       0      WS   8,32     8            512        0.10
<idle>       0      WS   8,64     8            512        0.08
ksoftirqd/3  28     WS   8,48     8            512        0.08
cat          14719  FFS  8,80     18446744073709551615 0         12.42
cat          14719  FFS  8,64     18446744073709551615 0         17.27
cat          14719  FFS  8,32     18446744073709551615 0         19.21
cat          14719  FFS  8,48     18446744073709551615 0         20.52

All devices listed here are the spinning platters.

Edit2: as per frostschutz's suggestion, here's an extract from syslog after enabling block_dump

[40723.578347] md0_raid5(281): WRITE block 8 on sdc (1 sectors)
[40723.578359] md0_raid5(281): WRITE block 8 on sde (1 sectors)
[40723.578363] md0_raid5(281): WRITE block 8 on sdd (1 sectors)
[40723.578367] md0_raid5(281): WRITE block 8 on sdf (1 sectors)
[40723.824546] md0_raid5(281): WRITE block 16 on sdc (8 sectors)
[40723.824560] md0_raid5(281): WRITE block 16 on sde (8 sectors)
[40723.824566] md0_raid5(281): WRITE block 16 on sdd (8 sectors)
[40723.824570] md0_raid5(281): WRITE block 16 on sdf (8 sectors)

So it seems mdadm is the culprit, constantly writing (presumably) to superblock offsets?

Investigating further confirms this: mdadm -E /dev/sdc reports different checksum every second. Event count generally remains fixed, but if I re-examine the drive often enough, every now and then state will change from "clean" to "active" and during such examinations, event count is one higher than otherwise.

So, is there a logical explanation on what's going on or something I could do to get more insight on what's going on with my disks?

You might get some info out of [iosnoop](http://unix.stackexchange.com/a/299435/119298). — meuh, Dec 10 '16 at 20:42
...or (very verbose, disable after a few) `echo 1 > /proc/sys/vm/block_dump` — frostschutz, Dec 10 '16 at 20:44
`iotop` can help you to identify the process that's doing the I/O, in case it's not `mdadm` or some system stuff. — dirkt, Dec 11 '16 at 20:25

velis · Answer 1 · 2016-12-11T19:51:42.287

2

Thanks to meuh & frostschutz I was able to identify the offending process. It seems mdadm was doing some post-synchronisation stuff on the array (I replaced a drive a few days ago in the RAID-5 array).

Actually, it has stopped now, a few days after the drive had been replaced. Funny though that it would do this at all since the only I/O were the writes into the superblock area. I guess an authoritative answer could only be provided by peeking into the code, which I am at this time not qualified to do.

Edit: I just copied a few 10GBs of data into the array and the grinding started again. So it's not post-sync, it's post-any-write...

edited Dec 11 '16 at 19:51

answered Dec 11 '16 at 13:48

velis

367
1
4
16

So this is still going on even after `sync`, so it's not just delayed/buffered writes? That's strange then, I definitely don't have that problem with my RAID5. Block 8 is mdadm metadata, 16 is mdadm bitmap, either should be updated only when actual writes happening on the md device. – frostschutz Dec 13 '16 at 08:38
Yes, it's still going on. Now that I enabled a few services working on the LVM volumes (databases, SVN, file server, etc.), this is pretty much constant. Even if I make a tiny 4KB write, I get between 3 and 5 "clicks" of the disks. The worst of it is that it's regular. I have posted this problem to mdadm mailing list, but so far there's been no answer. I don't think that list is meant for users, so they might have simply ignored me. – velis Dec 13 '16 at 20:40
Don't worry, there are a ton of users on that list. Only question is whether anyone has any clue as in regards to your issue. Making writes (however small) causing metadata updates is normal. If you want to reduce the noise you could disable bitmap (up to you whether you decide that overhead worth it or no). – frostschutz Dec 13 '16 at 20:54
1

I need to read up on the bitmap in order to try disabling it. AFAIK mdadm uses it to keep track of synced blocks, together with event count on each member. So what would removing the bitmap actually get me? Edit: Ah, the wiki is quite clear on bitmap's purpouse. Pehaps I really might try disabling it for a test. – velis Dec 13 '16 at 20:58
Also, I'm not sure the grinding is a problem caused by writing to sectors 8 and 16. Those are too close together, probably even on the same track. I worry about that long one from `iosnoop` results: I think THAT one is causing the audible drive noises because maybe the heads move to accomodate it (even though that offset is WAY too high) – velis Dec 13 '16 at 21:37
If it's ext4 filesystem, recently created, you might also be suffering from ext4 lazy init – frostschutz Dec 13 '16 at 21:53
Nope, that one finished initializing + it showed on iotop, I think – velis Dec 14 '16 at 10:40

What is grinding my HDDs and how do I stop it?

1 Answers1