8

This:

$ seq 100000 | xargs -P0 -n1 -I {} bash -c 'echo {};sleep {}'
:
5514
bash: fork: retry: No child processes

started complaining around 5500 when the system had 11666 processes running. Now, 11666 was really surprising to me given:

$ ulimit -u
313370
$ cat /proc/sys/kernel/pid_max
313370
$ grep hard.*nproc /etc/security/limits.conf
*                hard    nproc           313370

Why can I only run 11600 processes?

Edit:

Testing on another user I get to 6100 (i.e. 12200 procs), thus totalling 24000 procs. So the limit is not systemwide.

$ uname -a
Linux aspire 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
$ grep -i tasksmax /etc/systemd/*
/etc/systemd/logind.conf:#UserTasksMax=12288
/etc/systemd/system.conf:#DefaultTasksMax=

So the 12288 could be the culprit. I changed that to 1000 and did:

sudo systemctl daemon-reexec
sudo systemctl restart systemd-logind

If I now log in as a user I have not logged in as before, the new limit works. But if I log in as a user that has recently been logged in, the limit active at the first login is enforced. So the limit is cached somewhere.

Using the above I tested up to 30000 procs and this works, but only for users that have not logged in before.

So what is caching the limit from /etc/systemd/logind.conf? And how can I flush that cache?

The new limit is well above 60000 procs (and could possibly be the 313370 as I would expect).

Ole Tange
  • 33,591
  • 31
  • 102
  • 198
  • 1
    Are you running out of memory? Is this an openvz container? – jordanm Apr 16 '18 at 00:45
  • 1
    There is not enough information in this question for it to be answerable. For starters, the version of Linux (yes, _the kernel_) is important. So too is what system and service managers are being used. Different kernels and different system/service managers set limits differently. It's even important what specific version of the system/service manager one is using. Hint: https://unix.stackexchange.com/questions/253903/ – JdeBP Apr 16 '18 at 00:50
  • However, even if here are some lacks in background information and as far as I understand the given summary, you may take advantage from [Do changes in limits.conf require ...](https://unix.stackexchange.com/questions/108603/do-changes-in-etc-security-limits-conf-require-a-reboot) from working with `limits.conf` and `prlimit` as mentioned in that thread. – U880D Apr 16 '18 at 12:33
  • 3
    @U880D It is clear, that the limit is not from `limit.conf`, but instead from `/etc/systemd/logind.conf`. The question is: How do I flush the cache of this, so the limits given in that file is respected if I `ssh` to localhost as myself. – Ole Tange Apr 16 '18 at 13:05

1 Answers1

4

The system in question runs systemd. This is one thing that uses cgroups to divide system resources among various groups of processes.

It is probable that the sysctl kernel.sched_autogroup_enabled = 1 is set. That would be a second thing dividing the system resources using cgroups.

There's a possibility that once a cgroup or a set of cgroups for a particular user has been initialized, it stays untouched until reboot.

I don't have a way to hunt whether it is because of systemd or autogroup, whether it is because of process number limitation or because of memory limitation (inside a cgroup), nor the time to hunt in the source code. Wanted to comment instead of answering, but I don't have enough reputation.

Jake F
  • 326
  • 2
  • 5