5

I have Centos 6.7 running java application via a wrapper programme. So first I ran this.

lsof -p 15200 | wc -l and I got the results immediately as 200

next I ran this lsof -p 15232 | wc -l I keep taking too long and never generated any results. What other method can I use to get the total open files? I need to know cause my system keep hanging after certain time. I will maybe need to increase the open file size.

Jeff Schaller
  • 66,199
  • 35
  • 114
  • 250
user8012596
  • 197
  • 2
  • 3
  • 8
  • 1
    You seem to be focusing on the number of open files, assuming that this is the cause of your system freezing. That appears unlikely to me, and also makes this question a case of an [XY problem](https://meta.stackexchange.com/q/66377/157730). We can probably answer this, but I think you are more likely to get *useful* answers if you focus on why your system is freezing. Note that now that this question has been answered, you shouldn't change it too much, as such a change would; instead, you may want to ask a new question and include relevant details about diagnosing the freezing. – user May 19 '17 at 20:43
  • I am not assuming but I am trying to narrow down my problem to check is this a related issue thus I would like to find out if the number of files are keep growing and no shrinking. Actually I am getting too many close_wait in my socket application I notice this when the the system start to freeze – user8012596 May 19 '17 at 20:48
  • That process may indeed be related to your freeze. Do you know what it is? – Julie Pelletier May 19 '17 at 20:57
  • Sometimes, I find that IP address to host name resolution is what causes `lsof` to slow down. Try adding a `-n` option. (you may also want to add `-P` and `-M`). – Stéphane Chazelas May 19 '17 at 21:01
  • 1
    @JuliePelletier yes is my socket application and suddenly I see that close wait count keep increasing it takes a while almost after a week then it behaves this way then I restart my application is back to normal. – user8012596 May 19 '17 at 21:09
  • Then it seems to be a bug in the application which probably goes into an infinite loop (possibly under certain conditions). – Julie Pelletier May 20 '17 at 04:46
  • @JuliePelletier actually is a java socket application and I have set a finally section and I have ensure that write buffer is closed well. – user8012596 May 20 '17 at 05:13
  • Ok, if you say your application is bug-free then it must be the operating system that doesn't like it I guess. – Julie Pelletier May 20 '17 at 05:25
  • @JuliePelletier I am a bit confuse when I run this command /proc/$PID/limits it shows me 4096. Is 4096 just for this process or for overall system? So should I try increasing this to see if this ease it ? – user8012596 May 20 '17 at 05:36
  • It won't ease it. Based on the impression you give me, it goes through an endless loop and the more loose you give it, the worse the result will be. The real solution is to troubleshoot your application. If it were me, I'd put log traces so that I could see what was shown when the machine froze (after a restart or any way you can stop the process). Reducing its priority might also help troubleshoot it. – Julie Pelletier May 20 '17 at 06:13
  • @JuliePelletier actually the machine does not freeze is the application which freeze cause is like I notice no updates happening to my mysql and the mysql cpu usage is very high and the number of close_wait is quite high at that moment. But also the established number of devices are high. So when the established connections are low below 100 its working fine when it reaches around 200 plus is where I notice this starts. – user8012596 May 20 '17 at 06:23
  • @JuliePelletier the moment this happens and when I restart every thing is back to normal. – user8012596 May 20 '17 at 06:23
  • OK, and again for the last time, you need to identify the problem happening in your application and this is typically done by putting traces in a log file. – Julie Pelletier May 20 '17 at 16:17
  • Yes found I actually have two main try and catch. The inner try and catch is for the sql. I notice where is any exception it reaches the inner finally section and does not go to the outer finally section which actually closes the socket. – user8012596 May 20 '17 at 16:50

1 Answers1

7

You can get the number files opened by a process identified by a PID, for instance 15232, doing:

ls -l /proc/15232/fd | wc -l

from the Debian lists:

I am trying to figure out the meaning of:

/proc/$PID/fd/*

files.

These are links that point to the open files of the process whose pid is $PID. Fd stands for "file descriptors", which is an integer that identifies any program input or output in UNIX-like systems.

This is also actually where the lsof command drinks the information to give you the files of a process.

This is a feature of the linux kernel, and is distribution agnostic.

Rui F Ribeiro
  • 55,929
  • 26
  • 146
  • 227
  • Mine is centos 6.7 so will this work? – user8012596 May 19 '17 at 20:46
  • Yes, this is a linux kernel feature. – Rui F Ribeiro May 19 '17 at 20:46
  • yes its working can I understand why my original command lsof -p 15232 | wc -l is not working for this particular pid but works for other pid. Another thing how to find out what is actual maximum open files can be open and what is the limit I am hitting now ? – user8012596 May 19 '17 at 20:54
  • @user8012596: don't post new questions in comments. Search online and if you can't find it, ask a new question. Do note however that reaching the limit should not freeze a computer but instead cause some programs to crash. – Julie Pelletier May 19 '17 at 20:55
  • @rui is actually not freezing my centos but just causing my db and app to almost freezing. – user8012596 May 19 '17 at 20:58
  • But I want want to know the difference between know the different between lsof -p 15232 | wc -l and ls -l /proc/15232/fd | wc -l? Arent they both doing the same tasks? – user8012596 May 19 '17 at 20:59
  • I/O bound / I/O exhaustion. Read the original @Michael Kjorling recommendation. – Rui F Ribeiro May 19 '17 at 21:06
  • @rui you mean my issue is related to i/o bound causing i/o exhaustion? – user8012596 May 19 '17 at 21:11
  • Probably. Consumer grade disks IDE/SATA disks can take just so much heavy duty concurrent accesses without everything slowing down. – Rui F Ribeiro May 19 '17 at 21:30
  • @rui I am uising sata with raid10 infact so I dont think that should be a problem right? – user8012596 May 20 '17 at 03:10