Why doesn't the 2nd command wait for the output of the 1st (piping)?

Question

I'm currently reading M. Bach's "THE DESIGN OF THE UNIX® OPERATING SYSTEM".

I read about the main shell loop. Look at the if (/* piping */) block. If I understood correctly, piping allows treating the 1st command output as the 2nd command input. If so, why isn't there a code that makes the 2nd command wait for the 1st to terminate? Without this command, piping seems nonsense: the 2nd command can start executing without its input being ready.

But.... What's the problem in the second command starting before the input is ready? It will just block with it tries to read input — muru, Nov 15 '21 at 21:31
pipes would be pointless if the first command had to finish before the second command even starts ... what if there's a terrabyte of data you want to pipe ... where's the data going to be held? — Bravo, Nov 15 '21 at 21:34

score 6 · Accepted Answer · answered Nov 15 '21 at 21:33

the 2nd command can start executing without its input being ready.

It does. There's nothing wrong with that.

In a pipeline producer | consumer, the two sides run concurrently¹. The consumer does not wait for the producer to finish. It doesn't even care if the producer has started. All the consumer needs is a place to read input from. This place exists as soon as the pipe has been created by the pipe call.

Reading from a pipe is a blocking operation. If no data has been written to the pipe yet, the reader blocks. The reader will be unblocked when data is written to the pipe. More generally, the reader blocks if no data is available on the pipe. Reading data from the pipe consumes it. Therefore it doesn't matter whether the producer has started writing by the time the consumer starts reading. The consumer will just wait until the producer writes some data.

The consumer receives data as soon as it becomes available.² It typically reads and processes the data in chunks. Most consumers do not need to have all the data before they can start processing it. If the consumer does need to have all the data available, it'll store it in memory or in a temporary file and wait for the end of the input.

Since the producer and the consumer are separate processes, they are executed concurrently. The fact that one of them may be running does not prevent the other from running. If both the producer and the consumer want CPU time, the kernel will share the CPU between them (and between any other process that wants CPU time). So even while the consumer is initializing, or while it's processing some data, the producer can also run and produce more data.

¹ _{You can say they run in parallel. That's not technically correct, but close enough.}
² _{In practice, the producer may buffer data internally. But as soon as the producer actually writes data to the pipe, consumer can read it.}

Why doesn't the 2nd command wait for the output of the 1st (piping)?

1 Answers1