3

Given the pipeline

a | b | c

how might I alter b so that it aborts the pipeline if b generates an error or matches a particular pattern in the input stream?

Derek Mahar
  • 505
  • 3
  • 13
  • 1
    Simply have `b` terminate. `a` will be killed by a `SIGPIPE` signal when trying to write to the left pipe, and `c` will get an EOF when trying to read from the right pipe. In `bash` (but _not_ in the shell in general), you can get the exit status of `b` from the `PIPESTATUS` array. –  Dec 09 '19 at 03:00
  • How do I make `b` terminate? – Derek Mahar Dec 09 '19 at 03:13
  • 1
    Call `exit` from within it. Or let it commit ritual suicide: `b(){ sed /die/q && kill "$BASHPID"; }; printf '%s\n' pass die oops | b | cat; echo "${PIPESTATUS[@]}"` ;-) –  Dec 09 '19 at 03:30
  • @mosvy, I confirmed that `b()` aborts the pipeline using the ritual suicide operation `kill "$BASHPID"` or with `exit 1`. – Derek Mahar Dec 09 '19 at 04:25
  • @mosvy, do you want to promote your comment to an answer? – Derek Mahar Dec 09 '19 at 04:26
  • 1
    @mosvy `b` terminating would not terminate the pipeline in the (very degenerate and unlikely) case where there is no actual I/O between the processes in the pipeline. – Kusalananda Dec 09 '19 at 07:03
  • 1
    @Kusalananda in which case you can turn the job control on and kill the process group all the processes in the pipeline are part of –  Dec 09 '19 at 07:05
  • @mosvy, how can you determine the process group of the pipeline processes? – Derek Mahar Dec 09 '19 at 07:22
  • 1
    There's more than one way to do it. `cat | cat | pkill -g0 | cat | cat` will kill all 4 cats before them being killed by `SIGPIPE` when trying to write to pipe with no reader, or exiting with status 0 because of EOF. `ps -ho pgrp "$BASHPID"` will tell you the process group `$BASHPID` is in. You can also get the same info directly from `/proc//stat{,us}`. –  Dec 09 '19 at 15:41
  • 1
    In a script (with no job control by default), you can also use a subshell to group processes for the purpose of killing them -- `bash` will always use separate processes for `(...)` subshells, and `pkill` and `pgrep` are able to find processes by their parent. –  Dec 09 '19 at 15:42
  • @mosvy, can you think of a way that an intermediate node in the pipeline might buffer the output and send it to final sink node `c` only if the input stream from `a` does not contain "die"? I tried using an intermediate "sponge" that https://unix.stackexchange.com/questions/337055/a-program-that-could-buffer-stdin-or-file describes following `sed /die/{q 1} || pkill -g0`, but the pipeline is subject to race conditions where sometimes it terminates and discards the input stream while other times `c` receives some input. – Derek Mahar Dec 09 '19 at 22:47
  • @mosvy, `printf '%s\n' pass die oops | { file=$(mktemp); trap "rm $file" EXIT; sed '/die/{q 1}' > $file && cat $file || exit 2; } | cat; echo "${PIPESTATUS[@]}";` is an extension of your solution where node `b` in the pipeline discards the entire input stream if it encounters string "die". – Derek Mahar Dec 10 '19 at 17:11
  • Even shorter version that uses a buffer variable instead of a temporary file: `printf '%s\n' pass die oops | { input=$(sed '/die/{q 1}') && echo "$input" || exit 2; } | cat; echo "${PIPESTATUS[@]}";` – Derek Mahar Dec 10 '19 at 17:24

1 Answers1

1

@mosvy's very helpful answer was mostly correct, but has the problem that b() always aborts the pipeline whether or not sed /die/q encounters "die":

Input stream contains "die"

$ b(){ sed /die/q && kill "$BASHPID"; }; printf '%s\n' pass die oops | b | cat; echo "${PIPESTATUS[@]}"
pass
die
0 143 0

Input stream does not contain "die"

$ b(){ sed /die/q && kill "$BASHPID"; }; printf '%s\n' pass oops | b | cat; echo "${PIPESTATUS[@]}"
pass
oops
0 143 0

In @mosvy's version, b() always aborts the pipeline because sed /die/q returns exit code 0 (success) if it encounters "die" or reaches the end of the input stream and so b() always invokes kill "$BASHPID".

In the following version, I correct @mosvy's answer so that b() aborts the pipeline only when it encounters "die" in the input stream:

Input stream contains "die"

b() {
  sed '/die/{q 2}' || kill "$BASHPID"
}

# Send "die" to b.
printf '%s\n' pass die oops | b | cat

echo "${PIPESTATUS[@]}"

Output:

pass
die
0 2 0

Input stream does not contain "die"

b() {
  sed '/die/{q 2}' || kill "$BASHPID"
}

# Do not send "die" to b.
printf '%s\n' pass oops | b | cat

echo "${PIPESTATUS[@]}"

Output:

pass
oops
0 0 0

Note that in this version of b(), if sed encounters "die", it invokes command q 2 which causes sed to terminate immediately with exit code 2 (failure), and then || to invoke kill "$BASHPID" which terminates b()'s process in the pipeline and aborts the pipeline. (Note that this version requires GNU sed which extends command q so that it accepts an exit code.)

As @mosvy mentions, instead of committing "ritual suicide", b() may simply exit from the process:

b() {
  sed '/die/{q 2}' || exit 3
}

# Send "die" to b.
printf '%s\n' pass die oops | b | cat

echo "${PIPESTATUS[@]}"

Output:

pass
die
0 3 0
Derek Mahar
  • 505
  • 3
  • 13