I'm trying to debug why our integration tests are taking so long and it looks like they're hanging part way through.
I logged onto our test service and saw the following behaviour:
root@colossus:~# strace -p 18310
Process 18310 attached - interrupt to quit
futex(0x7f9915c609d0, FUTEX_WAIT, 18313, NULL^C <unfinished ...>
Process 18310 detached
root@colossus:~# strace -p 18313
Process 18313 attached - interrupt to quit
restart_syscall(<... resuming interrupted call ...>^C <unfinished ...>
Process 18313 detached
root@colossus:~# ps -ef | grep 18313
root 19089 19034 0 09:46 pts/0 00:00:00 grep --color=auto 18313
root@colossus:~# ps -p 18313
PID TTY TIME CMD
My interpretation of these commands is that 18310 is waiting for its child 18313 to complete.
Process 18313 is trying to restart an interrupted system call.
Here's where it gets weird. Although I can attach to 18313, I cannot see it in the list of current processes, when I run ps.
Can someone help me understand what is going on here, please?