5

I have the following service configured:

[Unit]
Description=SCollector
After=NetworkManager.service

[Service]  
Type=simple
ExecStart=/bin/sh -c "/opt/scollector/scollector /opt/scollector/collectors || (echo '' | /usr/bin/mail -s 'scollector died' [email protected] && exit -1)"
Restart=on-failure

[Install]
WantedBy=multi-user.target    

For some reason, that mail command never sends any mail when the scollector process exits with non-0. This works AOK when run on the command line, /bin/sh call and all. I've captured STDOUT and STDERR of mail, and it is throwing no errors. There is nothing in maillog.

What gives? Why won't it send mail?

alienth
  • 2,187
  • 12
  • 20

2 Answers2

7

/usr/bin/mail performs a double fork to daemonize sendmail for sending the email. This sendmail proc gets reowned to init, so normally it wouldn't be affected by anything that happens with the original parent - except in the systemd case that reowned grandchild is still in the same cgroup as the original service. When systemd tears things down, it kills all processes within the cgroup, including the reowned sendmail process.

The mail command itself ran fine, but sendmail was getting killed by systemd before it had a chance to do its thing.

You can get around this by setting KillMode in the Unit section to process (the default is control-group). That will cause systemd to only kill the process which it directly fired.

Interestingly the way I stumbled upon this was through the use of strace. A normal strace revealed nothing, but the mail suddenly started working when using strace -f. strace -f was causing the main process to stick around until all of the children and orphaned grandchildren had wrapped up.

alienth
  • 2,187
  • 12
  • 20
2

The questioner has identified the problem; but xyr solution is a bodge, and xyr description of the mechanics is incorrect.

The mail command does not perform a double fork. It forks just once, and the sendmail shim process is its immediate child that is not reparented to anything. It simply chooses whether to waitpid() for that child or not, before it exits.

The same is true of the sendmail shim itself. It does not double fork. On some MTSes it doesn't even fork at all. On others it forks just the once and chooses whether to wait or not dependent from some configurable "delivery mode" option.

The correct way to get around the problem is twofold:

  1. Set mailx's documented and standardized sendwait option. That specifically addresses the problems of asynchronous enqueueing, by making mailx wait for the sendmail shim child process to finish. (Sadly, even though this option has been around since at least 1986 and is documented for mailx in the SVID, bsd-mailx does not have it. heirloom-mailx has it, though.)
  2. Set whatever MTS is in use to use a synchronous queueing/delivery mode if it isn't using one already.
    • If using netqmail, do nothing. netqmail's sendmail shim is always queued and synchronous, directly chain loading through qmail-inject to qmail-queue without forking at all.
    • If using Postfix, do nothing. Postfix's sendmail shim is always queued and synchronous, forking once and waiting for postdrop to finish before exiting itself.
    • exim has the -odf command line option.

Further reading

JdeBP
  • 66,967
  • 12
  • 159
  • 343
  • Note: to set the mailx internal variable, add 'set sendwait' to /etc/mail.rc (at least that's where it is on CentOS 7) – siliconrockstar Jun 03 '16 at 01:28
  • I was able to add `-S sendwait` when calling mailx since I didn't want to change the system configuration, but needed to make my script work when called from a systemd service unit. – JFlo May 26 '19 at 15:12