2

Objective: check if a host is up-and-running on the network,
with a method that can be efficiently parallelized over many hosts.

Emphasis on a QUICK turnaround of the check.

Reason to emphasise speed: multiple checks (in the hundreds) need to be performed in rapid succession and return results reasonably quickly.


Current method

The current script uses a simple ping command. The choice is not obligatory, on the contrary any suitable tool with equal or better reliability and speed may be used as replacement.

Current script

Something along these lines:

ping -c 1 -W 100 -q "$NETWORK_HOST" &> /dev/null

Obvious deficiency of this approach: responses are needed with a sub-second turn-around and this can wait up to a second, from experience.

Alternative considered

Forget about speed of sequential execution, run many ping commands in parallel using GNU parallel and collate the results at the end. This has been experimented with but seems even worse in practice.

Intuition about a better solution

It seems like ping may work just fine as a "polling health check", all it needs to be tweaked to do is to wait very little time and time-out in case of no response.

Assumption: the network is considered reliable and FAST, the hosts don't have either of the qualities necessarily.


Question

How would you solve this? Which tool(s) would you use? Is this the right approach? Could you provide a code snippet?


  • Environment: OS X + Ubuntu hosts
  • Scripting shell: Bash
  • Can install additional software if needed.
  • Can compile/install new code for an application not in the repository and use that.
Gilles 'SO- stop being evil'
  • 807,993
  • 194
  • 1,674
  • 2,175
Robottinosino
  • 5,271
  • 12
  • 39
  • 51
  • Why do you set ttl to 1? – Serge Sep 27 '12 at 03:51
  • @Serge: sorry, fixed. BTW, on OSX -t is "Specify a timeout, in seconds, before ping exits regardless of how many packets have been received" – Robottinosino Sep 27 '12 at 04:00
  • aha, I see. does OSX version have equivalent to iputils's ping '-n' switch (not to resolve names)? – Serge Sep 27 '12 at 04:03
  • @Serge: OSX man ping: https://developer.apple.com/library/mac#documentation/Darwin/Reference/ManPages/man8/ping.8.html – Robottinosino Sep 27 '12 at 04:12
  • 2
    well it does support '-n'. I believe that adding -n will decrease delays, as it will tell ping not to resolve host name to display. Also, if you ping the host by IP, not by it's name the ping will work faster. I am pinging my router just now with no visible delays with the command 'ping -c 1 -W 1 -n $host'. And I can't call my network fast - it is a WiFi net. – Serge Sep 27 '12 at 04:19
  • time bash -c 'for ((i=0; i < 1024;i++)); do ping -c 1 -W 1 -n -q 192.168.8.254 1>>/dev/null; done' showed these times: real 0m4.463s user 0m0.562s sys 0m2.139s – Serge Sep 27 '12 at 04:22
  • 2
    Very similar: [Faster way than ping for checking if computer online](http://unix.stackexchange.com/questions/7580/faster-way-than-ping-for-checking-if-computer-online) – ire_and_curses Sep 27 '12 at 06:32
  • 3
    Have a look at `nmap`. – Ole Tange Sep 27 '12 at 07:42
  • Are the hosts on the local network? If you don't need to traverse a router do the checks asynchronously using `arp`. Ping the broadcast then check the arp table to see what's responded. – bahamat Sep 27 '12 at 21:03

2 Answers2

1

Provided that ping is good enough for you, fping is an alternative that works in parallel right away.

This is simplified version of what I use. It works with a list of hosts (one per line) passed in via pipe:

probe_hosts() {
    local report
    fping 2>/dev/null | while read report;
    do
        local host=${report/ is *}
        local state=${report/* is }
        if [ "$state" == "alive" ];
        then
            echo $host
        else
            echo unreachable: $host >&2
        fi
    done
}

# this is how you use it:
cat list_of_hosts \
    | probe_hosts \
    | do_something_with_live_hosts
Alois Mahdal
  • 4,330
  • 11
  • 40
  • 62
0

You can give nmap a range:

$ nmap -sn 138.0.0.0/24
$ nmap -sn 138.0.0.0-255

The -sn flag means to just ping the host and return (i.e. no port scan like nmap usually does).

Edit: After reading the comments I see that bahamat mentions arp. In fact, arp with no arguments runs faster than nmap for me, and finds everything connected to my LAN:

$ time arp
real    0m0.411s
dotancohen
  • 15,494
  • 26
  • 80
  • 116
  • 1
    `arp` just shows the kernel ARP cache. So it'll only show a machine if your machine has communicated with it recently. – derobert Dec 04 '14 at 20:02
  • @derobert: Thanks. I have noticed some inconsistent results with `arp`, now I know why! – dotancohen Dec 04 '14 at 21:53