I'm experimenting with Systemd and in particular Journald on a Yocto Poky (Linux) based embedded device with eMMC storage but I'm worried about flash wear since the write stats are excessively high as shown by both /sys/block/mmcblk0/stat and blktrace.
Note: I wish to keep the logs across reboots and I'm using the structured fields journald/journalctl provide, so a volatile journal with syslog forwarding doesn't seem feasible unless there's a way to keep the additional fields. That said, I can afford to lose some low-prio messages in a crash provided high-prio messages make it.
As an example, a test case that was logging a few short DEBUG message every 8 seconds
wrote to MMC at a rate of 1.4 GB/day despite the messages themselves being only 5 MB/day (but note that using journalctl -o export | wc as a metric gives 125 MB/day - presumably due to metadata).
There is nearly 300x write multiplication in that example, which seems very high (I would have expected 10x). I would like to know what can be done to reduce the multiplication factor?
blkparse output:
jbd2/mmcblk0p4- (180)
Reads Queued: 0, 0KiB Writes Queued: 33611, 134444KiB
Read Dispatches: 0, 0KiB Write Dispatches: 0, 0KiB
Reads Requeued: 0 Writes Requeued: 0
Reads Completed: 0, 0KiB Writes Completed: 0, 0KiB
Read Merges: 0, 0KiB Write Merges: 17722, 70888KiB
IO unplugs: 7835 Timer unplugs: 91
Allocation wait: 0 Allocation wait: 0
Dispatch wait: 0 Dispatch wait: 0
Completion wait: 0 Completion wait: 0
journal-offline (248, ...)
Reads Queued: 0, 0KiB Writes Queued: 6277, 46528KiB
Read Dispatches: 0, 0KiB Write Dispatches: 0, 0KiB
Reads Requeued: 0 Writes Requeued: 0
Reads Completed: 0, 0KiB Writes Completed: 0, 0KiB
Read Merges: 0, 0KiB Write Merges: 154, 3840KiB
IO unplugs: 563 Timer unplugs: 30
Allocation wait: 0 Allocation wait: 0
Dispatch wait: 0 Dispatch wait: 0
Completion wait: 0 Completion wait: 0
kworker/u2:0 (253, ...)
Reads Queued: 0, 0KiB Writes Queued: 32284, 288288KiB
Read Dispatches: 0, 0KiB Write Dispatches: 0, 0KiB
Reads Requeued: 0 Writes Requeued: 0
Reads Completed: 0, 0KiB Writes Completed: 0, 0KiB
Read Merges: 0, 0KiB Write Merges: 1351, 32744KiB
IO unplugs: 2553 Timer unplugs: 710
Allocation wait: 0 Allocation wait: 0
Dispatch wait: 0 Dispatch wait: 0
Completion wait: 0 Completion wait: 0
kworker/u2:1 (414, ...)
Reads Queued: 0, 0KiB Writes Queued: 28428, 260332KiB
Read Dispatches: 0, 0KiB Write Dispatches: 0, 0KiB
Reads Requeued: 0 Writes Requeued: 0
Reads Completed: 0, 0KiB Writes Completed: 0, 0KiB
Read Merges: 0, 0KiB Write Merges: 1256, 29428KiB
IO unplugs: 2290 Timer unplugs: 679
Allocation wait: 0 Allocation wait: 0
Dispatch wait: 0 Dispatch wait: 0
Completion wait: 0 Completion wait: 0
kworker/u2:2 (208, ...)
Reads Queued: 1, 4KiB Writes Queued: 22264, 200632KiB
Read Dispatches: 0, 0KiB Write Dispatches: 0, 0KiB
Reads Requeued: 0 Writes Requeued: 0
Reads Completed: 0, 0KiB Writes Completed: 0, 0KiB
Read Merges: 0, 0KiB Write Merges: 918, 21048KiB
IO unplugs: 1782 Timer unplugs: 481
Allocation wait: 0 Allocation wait: 0
Dispatch wait: 0 Dispatch wait: 0
Completion wait: 0 Completion wait: 0
systemd-journal (206)
Reads Queued: 13, 52KiB Writes Queued: 190, 508KiB
Read Dispatches: 0, 0KiB Write Dispatches: 0, 0KiB
Reads Requeued: 0 Writes Requeued: 0
Reads Completed: 0, 0KiB Writes Completed: 0, 0KiB
Read Merges: 4, 16KiB Write Merges: 0, 0KiB
IO unplugs: 128 Timer unplugs: 4
Allocation wait: 0 Allocation wait: 0
Dispatch wait: 0 Dispatch wait: 0
Completion wait: 0 Completion wait: 0
CPU0 (mmcblk0p4):
Reads Queued: 632, 61712KiB Writes Queued: 123054, 930732KiB
Read Dispatches: 8616, 61712KiB Write Dispatches: 101590, 930732KiB
Reads Requeued: 0 Writes Requeued: 0
Reads Completed: 8616, 61712KiB Writes Completed: 109579, 930732KiB
Read Merges: 4, 16KiB Write Merges: 21401, 157948KiB
Read depth: 2 Write depth: 9
IO unplugs: 15952 Timer unplugs: 2012
Things I've tried:
- adjusting SyncIntervalSec in journald.conf - it didn't seem to affect the totals much, just the timing
- F2FS instead of ext4 - actually made it a bit worse but I haven't tried tuning mount/mkfs params yet (any tuning ideas welcome)
- sysctl vm.dirty_writeback_centisecs=0 - this is because I noticed from blktrace that much of the amplification seemed to be from kworkers in between journal-offline runs, presumably from journald mmap() usage? that gave around 400 MB/day so it's a significant improvement, but I'm not sure that CRIT messages are still syncing immediately like the journald doc says
- https://github.com/systemd/systemd/pull/21183 seems like it might help, but it's got a way to go before being stable
- fs-level compression - results not clear yet, not sure if this methodology will evaluate it properly?