5

I have smartos machines running a custom application as an smf service (a circonus monitoring agent). On some of these machines the agent errors when starting and gets stuck in a restart loop eventually leading to the machine panicking. For every other smf service I have worked with they will go into "maintenance" mode after restarting a few times but this particular service never seems to. I don't see any way to tweak these settings in the smf manifest and I'm not finding much information about it in the oracle docs. Does anyone know if this is a configurable setting and if so where can I find it?

The SMF manifest defines the following restart method:

<exec_method name='restart' type='method' exec=':kill -HUP' timeout_seconds='10'/>
jesse_b
  • 35,934
  • 12
  • 91
  • 140
  • 1
    Is the agent restarting itself via some sort of watchdog process, or is SMF restarting it? It might be useful to add a notification (e.g. `svccfg setnotify -g maintenance mailto:[email protected]`) to get more information. Does the SMF service define a 'restarter'? – Jeff Schaller Jun 03 '20 at 20:40
  • @JeffSchaller: I've added the restart method defined in the manifest to the question. The start method points to a wrapper script that runs a binary. The wrapper script definitely isn't restarting it but the binary may be. I don't think setting notifications on the service entering maintenance mode will help at this time because it never actually enters maintenance mode. – jesse_b Jun 05 '20 at 13:37
  • Does the resulting (started) binary fork off the real process and act as a watchdog to it, such that the restarts result in new PIDs? Or does this "restart loop" exhibit as a single PID who has gotten stuck in an internal loop? – Jeff Schaller Jun 05 '20 at 14:16

0 Answers0