0

I'm confused. One of our DBAs is reporting an issue with LDAP connection errors. I figured I'd start tracing that from truss to see exactly what it's connecting to, but what I'm seeing makes no sense to me.

This is the full extract from the truss output regarding file descriptor 35:

# grep 35 /tmp/11834.2.truss | grep -v write.33  
/3:     read(35, " 0", 1)                               = 1  
/3:     read(35, "\f", 1)                               = 1  
/3:     read(35, "020101 `0702010304\080\0", 12)        = 12  
/9:     write(35, " 084\0\0\010020101 a84\0".., 22)     = 22  
/3:     read(35, " 0", 1)                               = 1  
/3:     read(35, "81", 1)                               = 1  
/3:     read(35, "9E", 1)                               = 1  
/3:     read(35, "020102 c819804 : c n = a".., 158)     = 158  
/9:     write(35, " 084\0\001 8020102 d84\0".., 340)    = 340  
/3:     read(35, " 0", 1)                               = 1  
/3:     read(35, "05", 1)                               = 1  
/3:     read(35, "020103 B\0", 5)                       = 5  
/3:     close(35)                                       = 0  
/6:     read(35, " 0", 1)                               = 1  
/6:     read(35, "\f", 1)                               = 1  
/6:     read(35, "020101 `0702010304\080\0", 12)        = 12  
/8:     write(35, " 084\0\0\010020101 a84\0".., 22)     = 22  
/6:     read(35, " 0", 1)                               = 1  
/6:     read(35, "81", 1)                               = 1  
/6:     read(35, "98", 1)                               = 1  
/6:     read(35, "020102 c819204 4 c n = M".., 152)     = 152  
/9:     write(35, " 084\0\001 @020102 d84\0".., 348)    = 348  
/6:     read(35, " 0", 1)                               = 1  
/6:     read(35, "05", 1)                               = 1  
/6:     read(35, "020103 B\0", 5)                       = 5  
/6:     close(35)                                       = 0  
/6:     read(35, 0x7FFFEFB4FFB4B, 1)                    Err#131 ECONNRESET  
/6:     close(35)                                       = 0  
/6:     read(35, " 0", 1)                               = 1  
/6:     read(35, "\f", 1)                               = 1  
/6:     read(35, "020101 `0702010304\080\0", 12)        = 12  
/8:     write(35, " 084\0\0\010020101 a84\0".., 22)     = 22  
/6:     read(35, " 0", 1)                               = 1  
/6:     read(35, "81", 1)                               = 1  
/6:     read(35, "A3", 1)                               = 1  
/6:     read(35, "020102 c819D04 ? c n = a".., 163)     = 163  
/8:     write(35, " 084\0\001 B020102 d84\0".., 350)    = 350  
/6:     read(35, " 0", 1)                               = 1  
/6:     read(35, "05", 1)                               = 1  
/6:     read(35, "020103 B\0", 5)                       = 5  
/6:     close(35)                                       = 0  

If I run pfiles on the process during this time, FD35 is never seen. And from what I can tell in the truss output, it's never opened, yet it's read from, written to, and closed twice during this trace, and it continues to be used afterwards. I'd like to know what it's talking to in order to run a network trace...

Has anyone seen anything like this behaviour before and could help explain it? Must admit, the company policy of "if it ain't broke, don't patch it" might be coming into play here...

Any info much appreciated.

StuWhitby
  • 67
  • 4

2 Answers2

0

A process does not need to open a fd in order to be able to use it.

It works if the parent process lets the child inherit the open file descriptor.

BTW: I would trust truss and check /proc/<pid>/fd/ for a list of open file descriptors.

schily
  • 18,806
  • 5
  • 38
  • 60
  • Well, the parent process in this case is zsched, which can't be traced to identify what that file descriptor would be. The file descriptor does appear transiently in /proc/fd, but is there a way to tell from there exactly what this FD is? It's barely open for any time at all before it's gone, though it's open regularly enough that I can probably keep trying until it gets a hit. – StuWhitby Jun 25 '18 at 15:05
  • The official documented way to find the related file is to check `/proc//path/`. So if you are able to find a timeslot when the fd is open, you should be able to succeed. – schily Jun 25 '18 at 16:48
  • Are you talking about the `snoop(1)` program? – schily Jun 27 '18 at 14:54
0

Schily's comments are spot-on for what was going on. I'm going to expand slightly though.

The file descriptor was being inherited from zsched. Running "date; pfiles " inside a while true loop showed the file descriptor, though it doesn't allow a truss to trace the same process at the same time. However, it did show me which client was connecting at the point the error occurred.

It's not possible to snoop the network adapter within the local zone. Going to the global zone I was able to snoop the traffic going to the specific port on that network adapter. This allowed me to trace the issue and identify the issue using Wireshark.

StuWhitby
  • 67
  • 4
  • Another thing you can do is find out where the file descriptor is coming from by using `truss` to get *all* system calls and find ones that return the file descriptor value - in this case `35`. `grep '= 35' /truss/output/file` would probably return `accept()` calls or something similar. – Andrew Henle Jun 28 '18 at 09:34
  • It doesn't show the accept within a local zone. That's where I had the problem with truss because I couldn't see the file handle being created because it was being silently inherited. – StuWhitby Jun 29 '18 at 12:46