9

Trying here to write a shell script that keeps testing my server and email me when it becomes down.

The problem is that when I logout from ssh connection, despite running it with & at the end of command, like ./stest01.sh &, it automatically falls into else and keeps mailing me uninterruptedly, until I log again and kill it.

#!/bin/bash
while true; do
    date > sdown.txt ;
    cp /dev/null pingop.txt ;
    ping -i 1 -c 1 -W 1 myserver.net > pingop.txt &
    sleep 1 ;
    if
        grep "64 bytes" pingop.txt ;
    then
        :
    else
        mutt -s "Server Down!" [email protected] < sdown.txt ;
        sleep 10 ;
    fi
done
Baraujo85
  • 698
  • 7
  • 29
  • 1
    I am not a bash expert, but what does the colon `:` do? It would make sense to me it it were a semicolon `;`... – Ned64 Aug 11 '19 at 12:21
  • 3
    @Ned64 The `:` does nothing. This is what it is designed to do. Here, instead of inverting the test, they use it to do a no-op before `else`. – Kusalananda Aug 11 '19 at 12:22
  • @Kusalananda OK, thanks. Thought it might be a typo that could explain the problem. – Ned64 Aug 11 '19 at 12:24
  • 1
    I'm also confused why one would try to leave a shell script running after logout. Wouldn't cron or systemd timers be a better choice for this? – Cliff Armstrong Aug 11 '19 at 13:26
  • Possible duplicate of [How can I run a command which will survive terminal close?](https://unix.stackexchange.com/questions/4004/how-can-i-run-a-command-which-will-survive-terminal-close) – Anthony Geoghegan Aug 12 '19 at 16:27
  • It's been working swiftly after added `-q` in `grep` command! – Baraujo85 Aug 12 '19 at 16:37
  • The main diferences from this question is that it involves `if` command usage, `grep` command options, such as `-q`, which solved my problem, the explanation of colon and semicolon usage, some difference between the operation of `grep` command in BSD and GNU systems! – Baraujo85 Aug 12 '19 at 16:45

1 Answers1

20

When GNU grep tries to write its result, it will fail with a non-zero exit status, because it has nowhere to write the output, because the SSH connection is gone.

This means that the if statement is always taking the else branch.

To illustrate this (this is not exactly what's happening in your case, but it shows what happens if GNU grep is unable to write its output):

$ echo 'hello' | grep hello >&- 2>&-
$ echo $?
2

Here we grep for the string that echo produces, but we close both output streams for grep so that it can't write anywhere. As you can see, the exit status of GNU grep is 2 rather than 0.

This is particular to GNU grep, grep on BSD systems won't behave the same:

$ echo 'hello' | grep hello >&- 2>&-    # using BSD grep here
$ echo $?
0

To remedy this, make sure that the script does not generate output. You can do this with exec >/dev/null 2>&1. Also, we should be using grep with its -q option since we're not at all interested in seeing the output from it (this would generally also speed up the grep as it does not need to parse the whole file, but in this case it make very little difference in speed since the file is so small).

In short:

#!/bin/sh

# redirect all output not redirected elsewhere to /dev/null by default:
exec >/dev/null 2>&1

while true; do
    date >sdown.txt

    ping -c 1 -W 1 myserver.net >pingop.txt

    if ! grep -q "64 bytes" pingop.txt; then
        mutt -s "Server Down!" [email protected] <sdown.txt
        break
    fi

    sleep 10
done

You may also use a test on ping directly, removing the need for one of the intermediate files (and also getting rid of the other intermediate file that really only ever contains a datestamp):

#!/bin/sh

exec >/dev/null 2>&1

while true; do
    if ! ping -q -c 1 -W 1 myserver.net; then
        date | mutt -s "Server Down!" [email protected]
        break
    fi

    sleep 10
done

In both variations of the script above, I choose to exit the loop upon failure to reach the host, just to minimise the number of emails sent. You could instead replace the break with e.g. sleep 10m or something if you expect the server to eventually come up again.

I've also slightly tweaked the options used with ping as -i 1 does not make much sense with -c 1.

Shorter (unless you want it to continue sending emails when the host is unreachable):

#!/bin/sh

exec >/dev/null 2>&1

while ping -q -c 1 -W 1 myserver.net; do
    sleep 10
done

date | mutt -s "Server Down!" [email protected]

As a cron job running every minute (would continue sending emails every minute if the server continues to be down):

* * * * * ping -q -c 1 -W 1 >/dev/null 2>&1 || ( date | mail -s "Server down" [email protected] )
Kusalananda
  • 320,670
  • 36
  • 633
  • 936
  • Using `>&-` will close the fd (as in, file descriptor 1 is closed), while closing the SSH connection will have a different effect (a file descriptor will be still around, but not connected to anything on the other side.) I think the point still stands, which is that GNU grep exits non-zero if it tries to write output and that fails. Yeah, best solution is just checking exit status of ping directly. – filbranden Aug 11 '19 at 14:22
  • 4
    It might be safer to just redirect everything to/from /dev/null for the entire script by adding `exec /dev/null 2>&1` near the beginning. That way if e.g. `ping` decides to write something to stderr it won't cause a problem. – Gordon Davisson Aug 11 '19 at 19:35
  • @GordonDavisson I don't really see a reason to pull stdin from `/dev/null` here, but I sorted out the output. Thanks for the suggestion. – Kusalananda Aug 12 '19 at 15:25