2

I am trying to redirect stdout of a command into two "branches" using tee for separate processing. Finally I need to merge results of both "branches" using paste. I came up with the following code for the producer:

mkfifo a.fifo b.fifo
python -c 'print(("0\t"+"1"*100+"\n")*10000)' > sample.txt
cat sample.txt | tee >(cut -f 1 > a.fifo) >(cut -f 2 > b.fifo) | awk '{printf "\r%lu", NR}'
# outputs ~200 lines instantly
# and then ~200 more once I read from pipes

and then in a separate terminal I start the consumer:

paste a.fifo b.fifo | awk '{printf "\r%lu", NR}'
# outputs ~200 once producer is stopped with ctrl-C

The problem is that it hangs. This behaviour seems to depend on the input length:

  1. If input lines are smaller (i.e. if second column contains 30 characters instead of 100) it works fine.
  2. If a.fifo and b.fifo are fed with the same (or similar in length) input it looks like it also works fine.

The problem seemingly arises when I feed short chunks in say a.fifo and long in b.fifo. This behaviour does not depend on the order in which I specify pipes in paste.

I am not very familiar with Linux and its piping logic but it seems that somehow it deadlocks. My question is whether this can be reliably implemented somehow? If so, how? Maybe there are other ways without using tee and paste?

HollyJolf
  • 21
  • 3
  • On my system, the producer script complains with `paste: /dev/fd/64: No such file or directory` due to using too many process substitutions on the second line. Removing one of the `{,}` bypasses this. – Kusalananda May 07 '20 at 06:30
  • 1
    You also never flush the output buffer in `awk`. The output is flushed by printing a newline, or by calling `fflush()` in `awk`. Your example is a bit contrived so it's difficult to say if this is your actual issue though. – Kusalananda May 07 '20 at 06:34
  • @Kusalananda, I have edited the example so there are no `{,}` anymore. I am using `awk` to see how many lines it actually writes/reads. The behaviour does not change if `> /dev/null` or `| wc -l` is used instead of `| awk`. – HollyJolf May 07 '20 at 16:41
  • Related: [tee + cat: use an output several times and then concatenate results](//unix.stackexchange.com/q/66853) – Stéphane Chazelas May 07 '20 at 16:43

1 Answers1

0

I have not understood the problem in detail. Obviously it is about the size difference of each line filling some buffer.

This can be "solved" by enlarging the buffer:

paste a.fifo <(buffer <b.fifo) | awk '{printf "\r%lu", NR}'

Interesting fact: Adding a buffer to the generating command lets the awk finish but the consuming command still blocks (close to the end in my case):

$ cat sample.txt | tee >(cut -f 1 > a.fifo) >(cut -f 2 | buffer > b.fifo) | awk '{printf "\r%lu", NR}; END { print; print NR; }'
10001


$ paste a.fifo b.fifo | awk '{printf "\r%lu", NR}'
8152

Doesn't make sense IMHO. I would not be surprised if there is a bug involved.

Hauke Laging
  • 88,146
  • 18
  • 125
  • 174