4

I usually use command1 | command2 | command3 a lot in Linux but most of them are dealing with definite content.

When I tried this with an infinite stream cat | sed '' | sed '' which hopefully simulates an infinite stream it didn't work utill I terminated it with Ctrl-D. I can solve the problem with using cat | sed -e '' -e '' but I would like to know why the first one doesn't work. cat | cat | cat works just fine. Is it something to do with sed, if so what is that problem?

I tried to think about this problem and the only thing I found different was that when I am using cat I hit the Enter key which does something special that is not happening in the first sed '' above?

Can anyone let me know how to make pipe work seamlessly with infinite steams?

Nishant
  • 563
  • 9
  • 21
  • 2
    As an experiment to clear up what's happening with buffering, type (or copy-paste) a *bunch* of stuff into `cat`'s input — 4096 bytes should be enough. Eventually you'll get a bunch of output all at once. Then, you won't get any more until you add even more input. – hobbs May 18 '16 at 19:36

3 Answers3

5

The pipes connect the output or the left command to the input of the right command. This has nothing to do with the length of the stream. However, each command in the pipeline still has it's own buffering rules. If you don't trigger them in each command you won't see them on the final output.

user1794469
  • 3,909
  • 1
  • 23
  • 42
  • Then how come `cat | sed '' works`? I mean it should wait for the buffering limit before it is sent to stdout or not? Is it like `cat` sends its buffer content immediately to sed? – Nishant May 18 '16 at 14:29
  • 1
    Buffering in a program can change based on where the output is going. This can be observed when running a command from the terminal, where stdout is most likely your screen, and compared to redirecting to a file. The former is usually line buffered, the later is usually buffered on some number of bytes, 4096 for example. – user1794469 May 18 '16 at 14:38
4

That's basically a duplicate of my answer on SO. However, since nobody mentioned the stdbuf command here, I felt like I should add that here as well.

===============

Basically a process that reads from a pipe can consume the data byte by byte as soon as they are available in the pipe. However, as long as the programs are using std io functions of the libc, like read, write etc, the libc will buffer the input/output of those programs depending on whether a program is writing to a terminal or not.

By default, if a program is writing to a terminal the libc will buffer the output line wise, if it goes not to a terminal it get's buffered block wise.

On Linux, having glibc, you can influence that behaviour using the stdbuf command, like this:

stdbuf -oL cat | stdbuf -ioL sed '' | stdbuf -iL sed ''

I'm using a line based output buffer for the cat command, a line based input and output buffer for the first sed command and a line based input buffer for the last sed command.

hek2mgl
  • 620
  • 6
  • 11
  • Just to add, please notice the obvious fact there is no guarantee any arbitrary program is using the libc or any libc at all, therefore you cannot *really* rely on this behavior. – dbanet May 18 '16 at 22:41
  • Yes, that's true. Programs are free to fill/flush buffers as they wish. `tee` for example is such a program – hek2mgl May 18 '16 at 22:47
3

You could use the -u option of sed to minimize buffering:

cat | sed -u '' | sed ''
adonis
  • 1,714
  • 9
  • 9