Why are these sshd-pts processes stuck in status D? It seems the system load averages are increasing due to these processes. How can I remove them without restarting my server.
(add ppid and etime)
[root@manager ~]# ps -eo pid,ppid,user,state,etime,command,wchan |grep sshd |awk -F " " '{if($4=="D")print}'
3024 7162 root D 31-00:45:56 sshd: root@pts/10 tty_ldisc_hangup
3799 23740 root D 62-03:49:15 sshd: root@pts/7 tty_ldisc_ref_wait
4883 23740 root D 29-02:12:59 sshd: root@pts/11 tty_ldisc_ref_wait
7162 23740 root D 34-21:39:42 sshd: root@pts/10 tty_ldisc_ref_wait
8011 15566 root D 62-21:06:45 sshd: root@pts/4 tty_ldisc_hangup
9297 29509 root D 71-21:44:30 sshd: root@pts/5 tty_ldisc_hangup
13927 32658 root D 48-15:41:05 sshd: root@pts/8 tty_ldisc_hangup
14488 1 root D 62-17:42:02 sshd: root@pts/6 tty_ldisc_ref_wait
15007 23740 root D 47-23:40:33 sshd: root@pts/9 tty_ldisc_ref_wait
15566 1 root D 68-22:23:34 sshd: root@pts/4 tty_ldisc_ref_wait
18017 1 root D 82-11:50:11 sshd: root@pts/3 tty_ldisc_ref_wait
22081 4883 root D 24-21:08:20 sshd: root@pts/11 tty_ldisc_hangup
25157 15007 root D 41-11:34:06 sshd: root@pts/9 tty_ldisc_hangup
28168 18017 root D 82-11:49:11 sshd: root@pts/3 tty_ldisc_hangup
29509 1 root D 71-21:47:22 sshd: root@pts/5 tty_ldisc_ref_wait
29718 3799 root D 61-02:02:54 sshd: root@pts/7 tty_ldisc_hangup
31394 14488 root D 62-13:38:32 sshd: root@pts/6 tty_ldisc_hangup
32658 23740 root D 58-21:47:54 sshd: root@pts/8 tty_ldisc_ref_wait
stack info
[root@manager ~]# cat /proc/3024/stack
[<ffffffff81396c99>] tty_ldisc_hangup+0xc9/0x220
[<ffffffff8138e4fc>] __tty_hangup+0x30c/0x410
[<ffffffff8138e911>] tty_vhangup_self+0x21/0x50
[<ffffffff8117b043>] sys_vhangup+0x23/0x30
[<ffffffff816d8639>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
[root@manager ~]# cat /proc/3799/stack
[<ffffffff813961f0>] tty_ldisc_ref_wait+0x20/0x50
[<ffffffff8138e8b8>] tty_poll+0x58/0x90
[<ffffffff8118f92e>] do_select+0x36e/0x680
[<ffffffff8118fe1b>] core_sys_select+0x1db/0x300
[<ffffffff8118fffa>] SyS_select+0xba/0x110
[<ffffffff816d8639>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
(Add more info:)
After checking the ppid, I've found that the processes that stuck at tty_ldisc_hangup are children of processes that stuck at tty_ldisc_ref_wait.
e.g. process 3024 is the child of process 7162. It seems that 7162 is waiting for 3024 to hangup tty.
And according to slm's suggestion, I checked lsof of these processes. Following is the truncated output of 3024 and 7162.
(3024)
sshd 3024 root 0u CHR 1,3 0t0 1028 /dev/null
sshd 3024 root 1u CHR 1,3 0t0 1028 /dev/null
sshd 3024 root 2u CHR 1,3 0t0 1028 /dev/null
sshd 3024 root 3u sock 0,6 0t0 1037695934 protocol: TCP
sshd 3024 root 5r FIFO 0,8 0t0 1037701548 pipe
sshd 3024 root 6w FIFO 0,16 0t0 1037701541 /run/systemd/sessions/705007.ref
sshd 3024 root 7w FIFO 0,8 0t0 1037701548 pipe
sshd 3024 root 9u CHR 136,10 0t0 13 /dev/pts/10
sshd 3024 root 10r FIFO 0,8 0t0 2320263517 pipe
sshd 3024 root 11w FIFO 0,8 0t0 2320263517 pipe
(7162)
sshd 7162 root 0u CHR 1,3 0t0 1028 /dev/null
sshd 7162 root 1u CHR 1,3 0t0 1028 /dev/null
sshd 7162 root 2u CHR 1,3 0t0 1028 /dev/null
sshd 7162 root 3u sock 0,6 0t0 1037695934 protocol: TCP
sshd 7162 root 4u unix 0xffff88069a94e580 0t0 1037701545 socket
sshd 7162 root 5r FIFO 0,8 0t0 1037701548 pipe
sshd 7162 root 6w FIFO 0,16 0t0 1037701541 /run/systemd/sessions/705007.ref
sshd 7162 root 7w FIFO 0,8 0t0 1037701548 pipe
sshd 7162 root 8u CHR 5,2 0t0 1644 /dev/ptmx
sshd 7162 root 12u CHR 5,2 0t0 1644 /dev/ptmx
sshd 7162 root 13u CHR 5,2 0t0 1644 /dev/ptmx
checking session info:
[root@manager ~]# loginctl session-status 705007
705007 - root (0)
Since: Thur 2018-06-14 10:10:31 UTC; 1 months 4 days ago
Leader: 7162 (sshd)
Remote: 10.161.16.14
Service: sshd; type tty; class user
State: active
Unit: session-705007.scope
├─3024 sshd: root@pts/10
└─7162 sshd: root@pts/10
And the remove ip is actually my host ip.
[root@tgdc-manager-machine ~]# hostname -I
10.161.16.14