slow/frozen ext4 // task sync blocked on big mostly write only server

Question

We have several 90TB servers (Areca RAID-6 partitioned into 10 ext4 partitions)

The application is basically a ring buffer; continuously writing data and deleting old data. As such, each partition is always 100% full (15GB per partition held as headroom).

Now we're seeing the writing application segfault because (I suppose) it cannot write to disk fast enough.

The app segfaults happen about the same time as this error (of which there are several):

Nov 26 11:33:10 localhost kernel: INFO: task sync:30312 blocked for more than 120 seconds.
Nov 26 11:33:10 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 26 11:33:10 localhost kernel: sync            D f63a4ec0     0 30312   6161 0x00000080
Nov 26 11:33:10 localhost kernel: f571fe9c 00000086 d9b29930 f63a4ec0 d9b29930 c18d5ec0 c18d5ec0 cca87419
Nov 26 11:33:10 localhost kernel: 0005c511 c18d5ec0 c18d5ec0 cca7d888 0005c511 c18d5ec0 f63b2ec0 e3abe130
Nov 26 11:33:10 localhost kernel: c107d121 00000001 00000046 00000000 d9b29d52 d9b29930 f3544d00 f3c62800
Nov 26 11:33:10 localhost kernel: Call Trace:
Nov 26 11:33:10 localhost kernel: [<c107d121>] ? try_to_wake_up+0x1d1/0x230
Nov 26 11:33:10 localhost kernel: [<c107d1df>] ? wake_up_process+0x1f/0x40
Nov 26 11:33:10 localhost kernel: [<c1063efe>] ? wake_up_worker+0x1e/0x30
Nov 26 11:33:10 localhost kernel: [<c1065a58>] ? insert_work+0x58/0x90
Nov 26 11:33:10 localhost kernel: [<c154ab53>] schedule+0x23/0x60
Nov 26 11:33:10 localhost kernel: [<c15490e5>] schedule_timeout+0x155/0x1d0
Nov 26 11:33:10 localhost kernel: [<c100dffe>] ? __switch_to+0xee/0x370
Nov 26 11:33:10 localhost kernel: [<c1066371>] ? __queue_delayed_work+0x91/0x150
Nov 26 11:33:10 localhost kernel: [<c154b311>] wait_for_completion+0x71/0xc0
Nov 26 11:33:10 localhost kernel: [<c107d180>] ? try_to_wake_up+0x230/0x230
Nov 26 11:33:10 localhost kernel: [<c118865c>] sync_inodes_sb+0x7c/0xb0
Nov 26 11:33:10 localhost kernel: [<c118dbc5>] sync_inodes_one_sb+0x15/0x20
Nov 26 11:33:10 localhost kernel: [<c1168988>] iterate_supers+0xa8/0xb0
Nov 26 11:33:10 localhost kernel: [<c118dbb0>] ? fdatawrite_one_bdev+0x20/0x20
Nov 26 11:33:10 localhost kernel: [<c118dc01>] sys_sync+0x31/0x80
Nov 26 11:33:10 localhost kernel: [<c15534cd>] sysenter_do_call+0x12/0x12

fstab mounts the partitions as ext4 noauto,rw,users,exec 0 0

System is 32 bit Centos 6.6 with 3.10.80-1 kernel.

Question: Is this some kind of disk corruption problem or is there something I need to tune in Linux or the filesystem to fix this? The application needs to run 24x7, forever...

How fast the application write ? how dose it know to stop when the disk is full ? — Rabin, Nov 28 '16 at 18:31
Application is writing at about 100-150 Mb/s. So that's about 20 MB/s. It writes the data as tons of 1GB files and there is a separate "deleter" process which deletes the "oldest" files every 30s. It keeps deleting until there it reaches the 15GB required free speace headroom on the disk. — Danny, Nov 29 '16 at 01:06

DepressedDaniel · Answer 1 · 2016-11-29T01:24:05.087

1

Now we're seeing the writing application segfault because (I suppose) it cannot write to disk fast enough.

Well, it shouldn't just up and segfault like that! I don't think any filesystem guarantees write throughput in near-100%-full conditions. Your application is badly designed from the start :(. But you may be able to fix it ... the next step would be to check if it is compiled with debugging symbols and try to get a stack trace to see where the segfault happened and work backwards from there.

From the newly provided information it sounds like you are running into performance issues from the filesystem trying to keep reasonably defragmented while at near-100% filled. I'd try increasing the aggressiveness of the deleter process so that it tries to keep the partition at 25% free space or so.

edited Nov 29 '16 at 01:24

answered Nov 28 '16 at 23:41

DepressedDaniel

4,169
12
15

Lots of debugging has been done on the app. It segfaults because it runs out of memory, because it can't write to disk the internal queue fills up. My real question is about the filesystem slowing down... – Danny Nov 29 '16 at 01:08
I think you right, it's likely a race condition where the delete cannot free the space before the writing – Rabin Nov 29 '16 at 05:44
I don't think that's it. The app writes one 1GB file every two minutes. The deleter maintains 15GB free space at all times, so if the deleter stopped or slowed down it would take the app 30 minutes to hit "full disk". df shows enough free space. Sync hanging means **nothing** is getting to the disk, right? – Danny Nov 29 '16 at 06:49
Sync hanging just means the filesystem is working hard to save the latest operations consistently. It is most certainly trying to shuffle data around on the disk in that time. I don't know the details, but I'm pretty sure if your filesystem is heavily fragmented, ext4 will try to do something about that. And that can't be good if the filesystem is nearly 100% full. The only reasonable way to utilize a filesystem at near 100% of capacity is to statically initialize it with some files and then overwrite those same files in place (to avoid fragmenting). Probably works best with ext2/3. – DepressedDaniel Nov 29 '16 at 20:25
1

ext4 doesn't defragment itself; what it *can* do is pre-allocate extents. – Stephen Kitt Dec 05 '16 at 13:47

slow/frozen ext4 // task sync blocked on big mostly write only server

1 Answers1

Linked