1

Nagios itself has the way to check its config file for validity, to ensure it would at least load the config without errors:

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Is it possible to do the same thing for the NRPE daemon? The manual page for NRPE suggests it doesn't support that.

I intend to update the NRPE config with Ansible's lineinfile module, so I want to check for validity to be sure at least I don't break the monitoring completely.

Jeff Schaller
  • 66,199
  • 35
  • 114
  • 250
Nikita Kipriyanov
  • 1,398
  • 8
  • 13

1 Answers1

2

I agree with your findings; the online NRPE PDF does not mention any way to validate or check the configuration file. Additionally, the process_arguments() function in the nrpe.c code itself does not indicate any hidden options to do the same.

Given the following notice on the NRPE home page:

Notice: As of NRPE version 4.0.1, this project is deprecated. It will not receive any more bugfixes or features, except to resolve security issues.

... I wouldn't expect such an option to appear, either.

I see that they packaged a travis-test-1 script whose goal appears to be to see if NRPE is functional. The script creates a config file with a sample command, installs the supporting script for the command, adds the nagios user, starts NRPE in daemon mode, then executes check_nrpe against localhost to run that sample command. This may make more changes in your environment than you prefer.

Taking inspiration from that idea, you could work around the issue in a couple ways. At a high level:

  1. perform a check after changing the config file to see if NRPE is still running; you could assume that it was your change that caused the daemon to stop running and revert the config change if so. If the daemon stayed running, then you should be all clear!
  2. stop NRPE; make the changes to the config file; then temporarily start NRPE and see if it stays running. If so, restart NRPE as normal; if not, revert the change and restart NRPE.

For option #1, you could use a simple external port check or a known-good check_nrpe location; you could even call check_nrpe from the remote host, with check_nrpe -H 127.0.0.1 -c known-good-command to see if the NRPE daemon is still running. You could also query the process table for the nrpe process (if it's started as a long-running process in your environment).

For option #2, you could use the GNU timeout utility to temporarily start NRPE. For example:

timeout 3s /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -f
if [ $? -eq 124 ]
  echo all good
  exit 0
else
  echo not good, NRPE did not start successfully
  exit 1
fi 

The snippet above attempts to start NRPE (assuming the default locations of everything) but in foreground mode. If NRPE stays running for the given period of time (3 seconds here), then timeout will cause NRPE to exit with a return code of 124; in that case, we can assume our change was successful. If the timeout utility exited with something other than 124, then there was probably a problem starting NRPE, and we can assume that our config change broke it.

Jeff Schaller
  • 66,199
  • 35
  • 114
  • 250
  • Thank you for this highly detailed and deep answer! I'll investigate the suggested options, but that will take some time, especially to marry these ideas with Ansible's validity test. – Nikita Kipriyanov Sep 27 '22 at 05:02