7

I have a big script which takes a file as input and does various stuff with it. Here is a test version:

echo "cat: $1"
cat $1
echo "grep: $1"
grep hello $1
echo "sed: $1"
sed 's/hello/world/g' $1

I want my script to work with process substitution, but only the first command (cat) works, while the rest don't. I think this is because it is a pipe.

$ myscript.sh <(echo hello)

should print:

cat: /dev/fd/63
hello
grep: /dev/fd/63
hello
sed: /dev/fd/63
world

Is this possible?

dogbane
  • 29,087
  • 16
  • 80
  • 60

4 Answers4

10

The <(…) construct creates a pipe. The pipe is passed via a file name like /dev/fd/63, but this is a special kind of file: opening it really means duplicating file descriptor 63. (See the end of this answer for more explanations.)

Reading from a pipe is a destructive operation: once you've caught a byte, you can't throw it back. So your script needs to save the output from the pipe. You can use a temporary file (preferable if the input is large) or a variable (preferable if the input is small). With a temporary file:

tmp=$(mktemp)
cat <"$1" >"$tmp"
cat <"$tmp"
grep hello <"$tmp"
sed 's/hello/world/g' <"$tmp"
rm -f "$tmp"

(You can combine the two calls to cat as tee <"$1" -- "$tmp".) With a variable:

tmp=$(cat)
printf "%s\n"
printf "%s\n" "$tmp" | grep hello
printf "%s\n" "$tmp" | sed 's/hello/world/g'

Note that command substitution $(…) truncates all newlines at the end of the command's output. To avoid that, add an extra character and strip it afterwards.

tmp=$(cat; echo a); tmp=${tmp%a}
printf "%s\n"
printf "%s\n" "$tmp" | grep hello
printf "%s\n" "$tmp" | sed 's/hello/world/g'

By the way, don't forget the double quotes around variable substitutions.

Gilles 'SO- stop being evil'
  • 807,993
  • 194
  • 1,674
  • 2,175
  • storing unbounded data in a variable? – Stéphane Gimenez Aug 10 '11 at 22:51
  • @StéphaneGimenez I don't understand your comment. – Gilles 'SO- stop being evil' Aug 10 '11 at 23:47
  • sorry, I read the 2nd/3rd solutions but missed the condition "preferable if the input is small" which was actually written, but too far above. – Stéphane Gimenez Aug 11 '11 at 00:27
  • Thanks @Gilles. Any reason why you use <"$tmp" in your commands instead of just "$tmp" e.g. `grep hello "$tmp"`? Can I `cp "$1" "$tmp"` to create the tmp file instead of `cat`? – dogbane Aug 11 '11 at 07:53
  • @dogbane Mostly it's a matter of style. `<"$tmp"` makes it visually obvious that you're reading from the file, it's less clear with `cat "$tmp"` (which reads, whereas `tee "$tmp"` writes, and `cp "$a" "$b"` reads from `$a` and writes to `$b`). For `grep` there's a difference: `grep hello "$tmp" shows the name of the temporary file (which is useless).` – Gilles 'SO- stop being evil' Aug 11 '11 at 08:50
  • @Gilles `grep hello "$tmp"` would not show the name of the temp file. It would print out matching lines. – dogbane Aug 11 '11 at 09:56
4

When you use a file, you can read its data many times. When you use a named pipe (what is actually created by process substitution), you can only read it once. So the grep and sed commands receive empty input.

(How to understand pipes might be a good reading.)

To so what you want to do with process substitution, you could write something like:

cat $1 | tee >(echo "cat: $1"; cat) | tee >(echo "grep: $1"; grep hello) | (echo "sed: $1"; sed 's/hello/world/g')

But in this case, the 2nd cat, grep and sed would be run in parallel, and their output interleaved. This might be more useful:

cat $1 | tee >(cat > cat.txt) | tee >(grep hello > grep.txt) | sed 's/hello/world/g' > sed.txt
jfg956
  • 5,988
  • 3
  • 22
  • 24
2

The usual way to do this is to make the $1 parameter optional. Then, one can define FILE=${1-/dev/stdin} and use FILE several times. However reading several times on a pipe will read sequentially, data will not be duplicated.

The easiest solution to this issue would be to use some temporary file.

if [ -z "$1" ] ; then FILE=$(mktemp); cat >FILE; else FILE=$1; fi

If you wish to explicitly pass some filename (eventually /dev/fd/x), the same temporary file trick can be used:

FILE=$(mktemp); cat "$1" >FILE

You could also make complex use of tee to duplicate input from stdin filedescriptor to several other filedescriptors. But this last method would be quite heavy.

Stéphane Gimenez
  • 28,527
  • 3
  • 76
  • 87
0

I file obtained by a process substitution is not seekable, depending on the underlying implementation, so you cannot read it more than once.

enzotib
  • 50,671
  • 14
  • 120
  • 105