2

My software runs a command that looks something like:

find | xargs do a potentially memory hungry job

The problem is that sometimes a potentially memory hungry job gets too hungry, the system gets unresponsive and I have to reboot it. My understanding is that it happens due to the memory allocation over commitment. What I would like to happen is that if a job spawned by xargs wants more memory than is available, it dies (I am OK with it) and that is it. I guess I can get this behavior if I turn off overcommitment system-wide, but it is not an option. Is it possible to turn it off for a process?

A possible solution I was thinking of was to set

ulimit -v RAM size

But something tells me it is not a good idea.

  • 1
    [This answer](https://unix.stackexchange.com/questions/173762/gnu-parallel-limit-memory-usage) suggests using `ulimit -v` and `ulimit -m` and explains why it is hard to find a more robust solution. – sjy Feb 15 '19 at 22:29
  • @sjy, thank you for pointing out, that is exactly my use case. Sad that there is no better solution :( – Ilia Minkin Feb 16 '19 at 04:45
  • What operating system is being used? – JdeBP Feb 16 '19 at 08:31
  • 1
    Rather than deleting my previous comment, I should point out that the link is outdated and there is now a solution, as explained in Ole Tange's answer. – sjy Feb 17 '19 at 01:03
  • @JdeBP the target systems are some Linux distros, the software will be publicly available for download by users – Ilia Minkin Feb 17 '19 at 18:12

2 Answers2

2

I think what you are looking for is --memfree in GNU Parallel:

find ... | parallel --memfree 1G dostuff

This will only start dostuff if there is 1G RAM free. It will start one more until there is either less than 1G RAM free or 1 job running per CPU thread. If there is 0.5G RAM free (50% of 1G RAM) the youngest job will be killed. So in metacode:

limit = 1G
while true:
  if freemem > limit:
    if count(running_jobs) < cpu.threads():
      another_job.start()
  if freemem < 0.5 * limit
    youngest_job.kill()

If combined with --retries 10 you can tell GNU Parallel to retry the killed job 10 times.

If dostuff takes a while to gobble up the memory, use --delay 30s to wait 30s before spawning the next job.

Ole Tange
  • 33,591
  • 31
  • 102
  • 198
  • This is great! Thank you for all your work on GNU Parallel. – sjy Feb 17 '19 at 01:00
  • One question. The doc says: "Additionally, GNU parallel will kill off the youngest job if the memory free falls below 50% of the size." What is the size here: the argument for --memfree, or the memory size available? Also, I have a 500GB machine. If I set --memfree to 100GB, then my job basically runs only with only one thread even if the memory available is always way way higher than 100GB. – Ilia Minkin Aug 20 '19 at 08:59
0
sysctl vm.overcommit_memory 2

If you want to avoid cgroups

user1133275
  • 5,488
  • 1
  • 19
  • 37