15

So, I thought this would be a pretty simple thing to locate: a service / kernel module that, when the kernel notices userland memory is running low, triggers some action (e.g. dumping a process list to a file, pinging some network endpoint, whatever) within a process that has its own dedicated memory (so it won't fail to fork() or suffer from any of the other usual OOM issues).

I found the OOM killer, which I understand is useful, but which doesn't really do what I'd need to do.

Ideally, if I'm running out of memory, I want to know why. I suppose I could write my own program that runs on startup and uses a fixed amount of memory, then only does stuff once it gets informed of low memory by the kernel, but that brings up its own question...

Is there even a syscall to be informed of something like that? A way of saying to the kernel "hey, wake me up when we've only got 128 MB of memory left"?

I searched around the web and on here but I didn't find anything fitting that description. Seems like most people use polling on a time delay, but the obvious problem with that is it makes it way less likely you'll be able to know which process(es) caused the problem.

Parthian Shot
  • 762
  • 4
  • 16
  • Question: have you evaluated remote monitoring products such as Nagios, LogicMonitor, etc? This is the kind of thing products like those are typically good for in most environments. – Spooler Apr 13 '17 at 20:23
  • Yep. We have nagios. We poll memory at 10 minute intervals. The problem is sometimes a process can balloon very rapidly- even in as little as a minute or two- and it's hard to do a post-mortem when that happens because there are a lot of moving parts on those systems. – Parthian Shot Apr 13 '17 at 20:27
  • Sure, and you don't want to monitor your machines to death. The only tool I've used for that kind of resolution is `sar`, and that may help you do perform a post-mortem. Continually reporting on that locally generated data would be pretty expensive on the network, though. Are you planning to keep the data local (or via something like NFS) much like kdump? – Spooler Apr 13 '17 at 20:31
  • Keeping it local for now. We have centralized logging & monitoring, but information at a fine granularity is only something we need when some specific problem comes up with an individual system, so we'd just be logging in and poking around that system, anyway. – Parthian Shot Apr 13 '17 at 20:39
  • Also `sar` is pretty cool. Thanks for the recommendation. Tend to use `htop` or regular `top`, though they're resource-hungry on a good day, so... maybe not the best tool for the job. :P – Parthian Shot Apr 13 '17 at 20:43
  • Triggering on `sar` output isn't something I've done or could really tell you much about. 'tis why this is merely a comment. – Spooler Apr 13 '17 at 21:01
  • Related question: https://unix.stackexchange.com/questions/172559/receive-signal-before-process-is-being-killed-by-oom-killer-cgroups – Caesar Aug 12 '23 at 10:48

3 Answers3

20

Yes, the Linux kernel does provide a mechanism for this: memory pressure notification. This is documented in https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt, section Memory Pressure.

In short, you register an eventfd file descriptor in /sys/fs/cgroup/memory/memory.pressure_level on which you want to receive notifications. These notifications can be low, medium, or critical. A typical use case would be to free some or all internal caches in your process when you receive a notification, in order to prevent an impending OOM kill.

niksnut
  • 301
  • 2
  • 2
9

What you are asking is, basically, a kernel-based callback on a low-memory condition, right? If so, I strongly believe that the kernel does not provide such mechanism, and for a good reason: being low on memory, it should immediately run the only thing that can free some memory - the OOM killer. Any other programs can bring the machine to an halt.

Anyway, you can run a simple monitoring solution in userspace. I had the same low-memory debug/action requirement in the past, and I wrote a simple bash which did the following:

  • monitor for a soft watermark: if memory usage is above this threshold, collect some statistics (processes, free/used memory, etc) and send a warning email;

  • monitor for an hard watermark: if memory usage is above this threshold, collect some statistics and kill the more memory hungry (or less important) processes, then send an alert email.

Such a script would be very lightweight, and it can poll the machine at small interval (ie: 15 seconds)

shodanshok
  • 439
  • 3
  • 4
  • `cron` runs every minute, how do you run bash script every 15 seconds? – Alex Martian Dec 31 '21 at 07:21
  • 1
    For <1m polling you can `sleep` inside your script main loop - ie: `white true; do ; sleep 15; done`. Other (more complex) option: use `monit` with 15s polling rate. Otherwise, stick with simple 1m polling via cron. – shodanshok Dec 31 '21 at 08:42
0

The current best answer is for cgroups-v1. For cgroups-v2, one can listen for file modified events on the memory.events file (documentation of the file content).

The behaviour of this file can actually be tested with a few shell commands:

# Spawn a new slice with memory limits to avoid OOMing the entire system
systemd-run --pty --user -p MemoryMax=1050M -p MemoryHigh=1000M bash

# Watch memory.events for changes and read when changed
inotifywait -e modify -m /sys/fs/cgroup$(cut -d: -f3 /proc/self/cgroup)/memory.events \
  | while read l; do echo $l; cat ${l// *}; done &
# Consume memory
tail /dev/zero

Sadly, this seems to only work if there's actually a memory limit set for the cgroup.

Caesar
  • 121
  • 6