9

I have written a script to ssh remote hosts, execute commands, save output to files, and examine outputs. But it always exit silently at line (( success++ )) when iterate first item in array workers. If I replace (( success++ )) with echo "process $worker", it will work fine and print all hosts. I cannot figure out what's wrong.

#!/bin/bash

set -x
set -e
workers=('host-1' 'host-2' 'host-3')

output_dir=$(mktemp -d)

for worker in ${workers[@]}; do
  ssh $worker '
    echo abc
    echo OK
  ' > "$output_dir/$worker" &
done

echo "waiting..."
sleep 3
wait

success=0
regexp='OK$'
for worker in ${workers[@]}; do
  output=`cat "$output_dir/$worker"`
  if [[ "$output" =~ $regexp ]]; then
    (( success++ ))
  fi
done

echo "Total ${#workers[@]}; success: $success; failure: $((${#workers[@]} - success))"
gzc
  • 315
  • 1
  • 2
  • 9
  • Rather than reading the whole file into a variable, why not use `if grep -q "$regexp" "$output_dir/$worker"; then`? Or even `grep -c "$regexp" "$output_dir"/*` to get a count of the number of OKs. Also consider `success=$(( success + 1 ))`. – Kusalananda Jun 21 '17 at 07:36
  • @Kusalananda That's a good advice. – gzc Jun 21 '17 at 09:41

2 Answers2

14

A simple example should explain why:

$ ((success++))
$ echo $?
1

The reason is that any arithmetic operation which produces a numeric value of zero returns 1. I don't know what to say - Bash has gotchas enough for the whole world.

l0b0
  • 50,672
  • 41
  • 197
  • 360
  • Thanks! BTW, This behavior is very confusing, even ... evil. It consumes people's lives. – gzc Jun 21 '17 at 09:53
  • 1
    That rule was created with [the expr utility](http://scosysv.polarhome.com/service/man/?qf=expr&af=0&sf=0&of=UNIXv7&tf=2)(Linked Unix V7 manual): `… exit codes: 0 if the expression is neither null nor 0, 1 if the expression is null or 0…` So, perhaps, Unix is the one to blame. –  Jun 21 '17 at 10:24
  • 2
    `if (( .. )) ; ...` wouldn't work if `(( .. ))` didn't return sensible return values. Of course one might say that it should only fail if there is an explicit comparison (like `(( i++ < n ))`), but the implicit comparison against zero makes stuff like `while (( i-- )) ` work in the same way as in C and other programming languages. – ilkkachu Jun 21 '17 at 10:55
  • @ilkkachu I understand the rationale, but this behaviour (in any language) is *not sane.* First, there are already ways to explicitly compare numbers, just use those. Second, the C language syntax is well known for lots of idiosyncrasies, and isn't exactly considered the gold standard. Third, the choice of zero is entirely arbitrary - why not every non-positive number instead? – l0b0 Jun 21 '17 at 11:47
  • @l0b0, telling boolean results apart from plain numbers would require typing, which the predecessors of C didn't really have, if I've understood my history lessons. (Neither has the shell's arithmetic.) From there it's just hysterical raisins. Though apparently, `(( ... ))` wasn't subject to `set -e` before Bash 4.1, and that's really what causes the problem here, not the return values of `(( ... ))` per se. – ilkkachu Jun 21 '17 at 12:05
9

It is the consequence of having -e set. Any command with an exit code of 1 (not zero) will trigger an exit.

This script works fine:

#!/bin/bash
(( success++))
echo "Still going 1 $success"

This doesn't

#!/bin/bash
set -e
(( success++))
echo "Still going 1 $success"

Solutions

The simplest is to remove the set -e line.

If that is not an option, Use this:

(( ++success ))

Other alternatives:

#!/bin/bash

set -e
success=0
success=$(( success+1 ))
echo "still going 1 $success"

success=0
(( success=success+1 ))
echo "still going 2 $success"

success=0
(( success+=1 ))
echo "still going 3 $success"

success=0
(( ++success ))
echo "still going 4 $success"

success=0
(( success++ ))
echo "still going 5 $success"

Only the option number 5 will have an exit code of 1.

Other (more complex solutions for any value of variable a).
The first one uses the (POSIX) colon (:) builtin to make it POSIX compatible.

: $(( a+=1 ))        ; echo "6 $a $?"   ## Valid Posix
   (( a++ )) || true ; echo "7 $a $?"
   (( a++ )) || :    ; echo "8 $a $?"
   (( a++ , 1 ))     ; echo "9 $a $?"
   (( a++ | 1 ))     ; echo "10 $a $?"
  • 2
    Those work when you're counting up from zero, but in the general case you'd need to squash the error exit with something like `(( something... )) || true` – ilkkachu Jun 21 '17 at 10:58
  • A couple of solutions more added @ilkkachu . Even one valid in posix. –  Jun 21 '17 at 20:01
  • 1
    @Arrow, good point on the `: $(( .. ))` alternative. It doesn't actually even need the `||` to catch the error, since there the command that runs is `:` and it always succeeds. – ilkkachu Jun 21 '17 at 23:15