8

Since a recent major upgrade to my distribution (PLD Linux), I have been having trouble with a whole slew of programs. As best I can tell, anything that touches OpenGL or PulseAudio segfaults. I'm using the proprietary nvidia drivers and a 3.2.x kernel. Xorg itself runs fine and I am able to run most programs, however things like mplayer segfault and no sound is produced by any program.

Once I figured out that it might be related to OpenGL, I started playing with glxgears as a test. Running it by itself segfaults instantly. Then I discovered that running it under strace runs fine. The same thing is true for mplayer. Running it on a test mp3 file segfaults instantly, running strace mplayer plays just fine (although pulse audio still dies and it reverts to a dummy output device).

How could running something under strace keep it from segfaulting and how would I continue to debug the situation?

Caleb
  • 69,278
  • 18
  • 196
  • 226

4 Answers4

2

I have observed that Nvidia's libGL.so attempts to detect if the current process is being traced, by opening /proc/self/status and looking for "TracerPid:". Different code paths are taken depending upon if the value of TracerPid is non-zero (i.e., is the current processing being traced or not).

Install sysdig, and capture the a trace for the offending process twice, once while stracing, once withouth strace. For example:

$ sysdig -w glxgears.scap proc.name=glxgears &
$ glxgears &
$ kill -TERM `pidof glxgears`
$ kill -TERM `pidof sysdig`
$ sysdig -w glxgears-strace.scap proc.name=glxgears &
$ strace glxgears &
$ kill -TERM `pidof glxgears`
$ kill -TERM `pidof sysdig`

Compare the textual output of the two different traces to observe the change in execution flow between the straced and non-straced runs of glxgears.

strace "fixes" your OpenGL issue, because libGL is behaving differently depending upon if the process is being traced/debugged.

Tom O
  • 21
  • 2
1

I would imagine that another package replaced libGL.so with its own version, replacing the nVidia version - most likely a Mesa package. To fix the issue, reinstall the proprietary nVidia driver, this will restore the nVidia provided libGL.so.

ciotog
  • 11
  • 1
0

You said you tried nv, nouveau and vesa. What happened on each case?

Also, try to boot your machine off a USB stick with another distro and see if the problem persists. If it doesn't, then, maybe, the driver versions from the other distros can be used on your machine. It could also shed some light on the specifics of the problem you are having (it seems to be a timing bug).

Are modern machines still capable of slowing down the PCI bus? Is it a desktop PC or a notebook?

Just as a side note, you may spare yourself a lot of future pain by avoiding ATI and NVidia altogether, if possible performance-wise. Their margins are so low that even a 1% drop in user base may trigger them into cleaning up their act.

rbanffy
  • 1,208
  • 10
  • 11
-1

Get rid of the proprietary nvidia drivers and use the open source ones. You identified yourself the proprietary nvidia drivers to be at fault.

aseq
  • 454
  • 5
  • 13
  • Great concept, but you know that is easier to say than do. I've run the neavou drivers, but they have issues with dual monitors and power management. – Caleb Mar 10 '12 at 01:06
  • There are other drivers besides neavou. I'd say it's an easy choice since right now you don't have a properly working system. – aseq Mar 10 '12 at 01:33
  • Seriously? If you have an answer for me, please edit your post to actually answer my question or provide a specific alternate solution. If you know of some magic I don't, you'll need to be more detailed in order to be useful. I'm well aware that there are other drivers out there, but you probably aware that they don't do everything the proprietary ones do. I've used `nv`, and even the `vesa` stuff drives it, but `nouveau` is by far the most featured and best performing. Ironically this system _is_ working for everything it is most important that it do, which is not the case with other drivers. – Caleb Mar 10 '12 at 22:13
  • Suggesting a work around is an answer too and I would say it is an alternate solution. Just because you don't like that solution is not so relevant. – aseq Mar 11 '12 at 02:34