Prevent the machine being slowed down by running out of memory

Question

My software runs a command that looks something like:

find | xargs do a potentially memory hungry job

The problem is that sometimes a potentially memory hungry job gets too hungry, the system gets unresponsive and I have to reboot it. My understanding is that it happens due to the memory allocation over commitment. What I would like to happen is that if a job spawned by xargs wants more memory than is available, it dies (I am OK with it) and that is it. I guess I can get this behavior if I turn off overcommitment system-wide, but it is not an option. Is it possible to turn it off for a process?

A possible solution I was thinking of was to set

ulimit -v RAM size

But something tells me it is not a good idea.

[This answer](https://unix.stackexchange.com/questions/173762/gnu-parallel-limit-memory-usage) suggests using `ulimit -v` and `ulimit -m` and explains why it is hard to find a more robust solution. — sjy, Feb 15 '19 at 22:29
@sjy, thank you for pointing out, that is exactly my use case. Sad that there is no better solution :( — Ilia Minkin, Feb 16 '19 at 04:45
Rather than deleting my previous comment, I should point out that the link is outdated and there is now a solution, as explained in Ole Tange's answer. — sjy, Feb 17 '19 at 01:03
@JdeBP the target systems are some Linux distros, the software will be publicly available for download by users — Ilia Minkin, Feb 17 '19 at 18:12

Ole Tange · Accepted Answer · 2019-08-20T10:42:18.720

2

I think what you are looking for is --memfree in GNU Parallel:

find ... | parallel --memfree 1G dostuff

This will only start dostuff if there is 1G RAM free. It will start one more until there is either less than 1G RAM free or 1 job running per CPU thread. If there is 0.5G RAM free (50% of 1G RAM) the youngest job will be killed. So in metacode:

limit = 1G
while true:
  if freemem > limit:
    if count(running_jobs) < cpu.threads():
      another_job.start()
  if freemem < 0.5 * limit
    youngest_job.kill()

If combined with --retries 10 you can tell GNU Parallel to retry the killed job 10 times.

If dostuff takes a while to gobble up the memory, use --delay 30s to wait 30s before spawning the next job.

edited Aug 20 '19 at 10:42

answered Feb 16 '19 at 09:40

Ole Tange

33,591
31
102
198

This is great! Thank you for all your work on GNU Parallel. – sjy Feb 17 '19 at 01:00
One question. The doc says: "Additionally, GNU parallel will kill off the youngest job if the memory free falls below 50% of the size." What is the size here: the argument for --memfree, or the memory size available? Also, I have a 500GB machine. If I set --memfree to 100GB, then my job basically runs only with only one thread even if the memory available is always way way higher than 100GB. – Ilia Minkin Aug 20 '19 at 08:59

score 0 · Answer 2 · answered Feb 16 '19 at 08:55

0

sysctl vm.overcommit_memory 2

If you want to avoid cgroups

answered Feb 16 '19 at 08:55

user1133275

5,488
1
19
37

Prevent the machine being slowed down by running out of memory

2 Answers2