2

I'm learning about process groups, a new thing for me. (I try to follow this anwer, inter alia: Why is SIGINT not propagated to child process when sent to its parent process?). I try and can't kill a process group, as it's ID seems to be running away.

$ sleep 1000 &
[1] 6468
$ ps ax -O tpgid | grep sleep
 6468  6511 S pts/4    00:00:00 sleep 1000
 6512  6511 S pts/4    00:00:00 grep --color=auto sleep
$ kill -9 -6511
bash: kill: (-6511) - No such process
$ ps ax -O tpgid | grep sleep
 6468  6515 S pts/4    00:00:00 sleep 1000
 6516  6515 S pts/4    00:00:00 grep --color=auto sleep
$ ps ax -O tpgid | grep sleep
 6468  6517 S pts/4    00:00:00 sleep 1000
 6518  6517 S pts/4    00:00:00 grep --color=auto sleep

Why is this so, and how can I catch and kill it? What am I getting and doing wrong?

GNU bash, version 4.3.42(1)-release (x86_64-pc-linux-gnu)

1 Answers1

5

That's because you're not printing the process group ID (PGID), you're printing the "controlling tty process group ID", tpgid. As explained in man ps:

   tpgid       TPGID     ID of the foreground process group on the tty
                         (terminal) that the process is connected to, or
                         -1 if the process is not connected to a tty.

So, what you're seeing is the PID of the foreground process which, in your case, is the ps program:

$ sleep 1000 &
[1] 6745
$ ps ax -O tpgid | grep -E 'sleep|ps a'
 6745  7136 S pts/1    00:00:00 sleep 1000
 7136  7136 R pts/1    00:00:00 ps ax -O tpgid
 7137  7136 S pts/1    00:00:00 grep --color -E sleep|ps a

as you can see above, the tpgid value printed is the PID of the ps process. What you're looking for is pgid, not tpgid:

   pgid        PGID      process group ID or, equivalently, the process ID
                         of the process group leader.  (alias pgrp).


$ ps ax -O pgid | grep -E 'sleep|ps a'
 8414  8414 S pts/1    00:00:00 sleep 1000
 8656  8656 R pts/1    00:00:00 ps ax -O pgid
 8657  8656 S pts/1    00:00:00 grep --color -E sleep|ps a

Of course, since you're not actually running any process group (this happens when, for example, a script calls other scripts), the PGID for sleep is the same as its PID. Nevertheless, you can actually kill it that way if you like:

$ kill -9 -8414
$ ps ax -O pgid | grep -E 'sleep|ps a'
10065 10065 R pts/1    00:00:00 ps ax -O pgid
10066 10065 S pts/1    00:00:00 grep --color -E sleep|ps a
[1]+  Killed                  sleep 1000

A more informative example would be to run a script like this:

#!/bin/bash

sleep 1000 &
sleep 1000 &
sleep 1000 &

sleep 1000

If I save that as foo.sh and run it, the various sleep commands will all have the same PGID:

$ foo.sh &
[1] 13555
$ ps ax -O pgid | grep -P '[s]leep|[f]oo.sh'
13555 13555 S pts/1    00:00:00 /bin/bash /home/terdon/scripts/foo.sh
13556 13555 S pts/1    00:00:00 sleep 1000
13557 13555 S pts/1    00:00:00 sleep 1000
13558 13555 S pts/1    00:00:00 sleep 1000
13559 13555 S pts/1    00:00:00 sleep 1000

So, each child process is in the process group of the parent, foo.sh. If we now kill the process group, all proceses will exit:

$ kill -9 -13555
$ ps ax -O pgid | grep -P '[s]leep|[f]oo.sh'
[1]+  Killed                  foo.sh
terdon
  • 234,489
  • 66
  • 447
  • 667
  • Why is/gets `sleep` (6745) connected to `ps ax -O tpgid` (7136) in the second snippet? –  Aug 14 '16 at 14:27
  • 2
    @tomas it isn't. As I explained (or tried to) in the answer, the `tpgid` is *not* the parent group ID. It is simply the PID of the process that is currently running in the foreground. Since the process that was in the foreground when the `ps` was running is, of course, the `ps` command itself (PID 6745), that's what is shown as `tpgid`. You might want to ping me (`@terdon`) in [/dev/chat](http://chat.stackexchange.com/rooms/26/dev-chat) to discuss this if you're still confused. – terdon Aug 14 '16 at 14:30
  • The definition in the first snippet suggests it is, doesn't it? "tpgid - ID of the foreground process group on the tty (terminal) that the process is connected to ...". (Other people might benefit in the future if you explain this here.) –  Aug 14 '16 at 14:38
  • @tomas no, the definition clearly states that it is the ID of the *foreground* process. I don't really see what else I can add to that. – terdon Aug 14 '16 at 14:39
  • I get it now. Thanks & sorry for my dumbness. –  Aug 14 '16 at 14:45
  • 1
    @tomas heh, if misunderstanding a man page meant someone is dumb, we would all be completely stupid. Everyone has failed to understand a man page at some point or another :) – terdon Aug 14 '16 at 14:46