I agree with your findings; the online NRPE PDF does not mention any way to validate or check the configuration file. Additionally, the process_arguments() function in the nrpe.c code itself does not indicate any hidden options to do the same.
Given the following notice on the NRPE home page:
Notice: As of NRPE version 4.0.1, this project is deprecated. It will not receive any more bugfixes or features, except to resolve security issues.
... I wouldn't expect such an option to appear, either.
I see that they packaged a travis-test-1 script whose goal appears to be to see if NRPE is functional. The script creates a config file with a sample command, installs the supporting script for the command, adds the nagios user, starts NRPE in daemon mode, then executes check_nrpe against localhost to run that sample command. This may make more changes in your environment than you prefer.
Taking inspiration from that idea, you could work around the issue in a couple ways. At a high level:
- perform a check after changing the config file to see if NRPE is still running; you could assume that it was your change that caused the daemon to stop running and revert the config change if so. If the daemon stayed running, then you should be all clear!
- stop NRPE; make the changes to the config file; then temporarily start NRPE and see if it stays running. If so, restart NRPE as normal; if not, revert the change and restart NRPE.
For option #1, you could use a simple external port check or a known-good check_nrpe location; you could even call check_nrpe from the remote host, with check_nrpe -H 127.0.0.1 -c known-good-command to see if the NRPE daemon is still running. You could also query the process table for the nrpe process (if it's started as a long-running process in your environment).
For option #2, you could use the GNU timeout utility to temporarily start NRPE. For example:
timeout 3s /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -f
if [ $? -eq 124 ]
echo all good
exit 0
else
echo not good, NRPE did not start successfully
exit 1
fi
The snippet above attempts to start NRPE (assuming the default locations of everything) but in foreground mode. If NRPE stays running for the given period of time (3 seconds here), then timeout will cause NRPE to exit with a return code of 124; in that case, we can assume our change was successful. If the timeout utility exited with something other than 124, then there was probably a problem starting NRPE, and we can assume that our config change broke it.