21

When I just used pipe in bash, I didn't think more about this. But when I read some C code example using system call pipe() together with fork(), I wonder how to understand pipes, including both anonymous pipes and named pipes.

It is often heard that "everything in Linux/Unix is a file". I wonder if a pipe is actually a file so that one part it connects writes to the pipe file, and the other part reads from the pipe file? If yes, where is the pipe file for an anonymous pipe created? In /tmp, /dev, or ...?

However, from examples of named pipes, I also learned that using pipes has space and time performance advantage over explicitly using temporary files, probably because there are no files involved in implementation of pipes. Also pipes seem not store data as files do. So I doubt a pipe is actually a file.

Gilles 'SO- stop being evil'
  • 807,993
  • 194
  • 1,674
  • 2,175
Tim
  • 98,580
  • 191
  • 570
  • 977

4 Answers4

24

About your performance question, pipes are more efficient than files because no disk IO is needed. So cmd1 | cmd2 is more efficient than cmd1 > tmpfile; cmd2 < tmpfile (this might not be true if tmpfile is backed on a RAM disk or other memory device as named pipe; but if it is a named pipe, cmd1 should be run in the background as its output can block if the pipe becomes full). If you need the result of cmd1 and still need to send its output to cmd2, you should cmd1 | tee tmpfile | cmd2 which will allow cmd1 and cmd2 to run in parallel avoiding disk read operations from cmd2.

Named pipes are useful if many processes read/write to the same pipe. They can also be useful when a program is not designed to use stdin/stdout for its IO needing to use files. I put files in italic because named pipes are not exactly files in a storage point of view as they reside in memory and have a fixed buffer size, even if they have a filesystem entry (for reference purpose). Other things in UNIX have filesystem entries without being files: just think of /dev/null or others entries in /dev or /proc.

As pipes (named and unnamed) have a fixed buffer size, read/write operations to them can block, causing the reading/writing process to go in IOWait state. Also, when do you receive an EOF when reading from a memory buffer ? Rules on this behavior are well defined and can be found in the man.

One thing you cannot do with pipes (named and unnamed) is seek back in the data. As they are implemented using a memory buffer, this is understandable.

About "everything in Linux/Unix is a file", I do not agree. Named pipes have filesystem entries, but are not exactly file. Unnamed pipes do not have filesystem entries (except maybe in /proc). However, most IO operations on UNIX are done using read/write function that need a file descriptor, including unnamed pipe (and socket). I do not think that we can say that "everything in Linux/Unix is a file", but we can surely say that "most IO in Linux/Unix is done using a file descriptor".

jfg956
  • 5,988
  • 3
  • 22
  • 24
  • Thanks! Are the two commands connected by a pipe running in parallel, instead of the second starts to run after the first finishes? – Tim Aug 05 '11 at 17:15
  • Yes, the 2 commands are run in parallel. If they were not and the 1st output more than the buffer, it would be blocked. You can try it by running `cmd1 > fifo` and `cmd2 < fifo` in 2 different shells, creating the named pipe with `mkfifo fifo`. – jfg956 Aug 05 '11 at 17:31
  • Another test you can do, is to kill `cmd2` while `cmd1`is still running: `cmd1` will probably stop reporting a broken pipe mesage. – jfg956 Aug 05 '11 at 17:33
  • Thanks! what do you mean would be blocked? If this happens, does it mean the date in the stream after block will be lost? – Tim Aug 05 '11 at 17:51
  • 2
    Data is not lost. If the pipe buffer is full, `cmd1`'s write to the pipe will only return when `cmd2` will have read data from the pipe. In the same way, `cmd2`'s read from a pipe will block if the buffer is empty until `cmd1` writes to the pipe. – jfg956 Aug 05 '11 at 18:00
  • @jfgagne: *About "everything in Linux/Unix is a file", I do not agree.* - I see this as a different understanding of the word "file". In the quote it was not meant that file is some data in a filesystem. IMHO it was meant that most important part of the API (mainly `read()`, `write()`, `dup()`, `close()`, `select()`, `ioctl()` etc.) is the same for regular files and file-like objects (i.e. pipes, terminals, serial ports, block devices etc.) – pabouk - Ukraine stay strong Sep 12 '13 at 12:28
  • @pabouk: I think we agree that most IO "objects" are manipulated using "file descriptor" and "streams". One exception though is UDP socket. – jfg956 Sep 12 '13 at 16:57
  • Is there a way to change the fixed size buffer of pipes? Where can I find out what the fixed sized buffer is? Also on another note, it seems that fuser can't find the processes trying to write to a named pipe? I just tried it. Is that because the file descriptor does not actually exist on the _named pipe file_? – CMCDragonkai May 01 '15 at 03:14
  • Maybe also include `rm tmpfile` after `cmd1 >tempfile; cmd2 – tripleee Feb 28 '17 at 19:12
4

Two of the basic fundamentals of UNIX philosophy are

  1. To make small programs that do one thing well.
  2. and expect the output of every program to become the input to another,as
    yet unknown,program.

    The use of pipes let you leverage the effects of these two design
    fundamentals to create extremely powerful chains of commands to achieve your desired result.

    Most command-line programs that operate on files can also accept input on standard in(input through keyboard) and output to standard out(print on
    screen).

    Some commands are designed to only operate within a pipe can't operate on files directly.

    for example tr command

  ls -C | tr 'a-z' 'A-Z'
    cmd1 | cmd2
  • Sends STDOUT of cmd1 to STDIN of cmd2 instead of the screen.

  • STDERR is not forwarded across pipes.

    In short Pipes is character (|) can connect commands.

    Any command that writes to STDOUT can be be used on the left hand side of pipe.

       ls - /etc | less 
    

    Any command that reads from STDIN can be used on the right-hand side of a pipe.

       echo "test print" | lpr 
    

    A traditional pipe is "unnamed" because it exists anonymously and persists only for as long as the process is running. A named pipe is system-persistent and exists beyond the life of the process and must be deleted once it is no longer being used. Processes generally attach to the named pipe (usually appearing as a file) to perform inter-process communication (IPC).

source : http://en.wikipedia.org/wiki/Named_pipe

Vishwanath Dalvi
  • 4,346
  • 5
  • 20
  • 17
3

To supplement the other answers...

stdin and stdout are file descriptors and are read and written as if they are files. therefore you can do echo hi | grep hi, and it will replace echo's stdout with a pipe and replace stdin of grep to other end of this pipe.

ctrl-alt-delor
  • 27,473
  • 9
  • 58
  • 102
user606723
  • 895
  • 1
  • 6
  • 15
1

Everything is a file.

If we take the phrase too literally, we would end up with a meaning of “we only have files, and nothing else”. This is not the correct interpretation, so what is.

When we say “Everything is a file”, we are not saying that everything is stored on a disk. We are saying that everything looks like a file, can be read, can be written.

In Unix, once a file, or non-file is open, then it can be treated like a file. However not all files support all operations. E.g. some files (that are not files), do not support seek: they must be read/written in sequence (this is true of pipes and sockets).

Everything has a filename (on some systems: e.g. Debian Gnu/Linux, and many other Gnu/Linux).

  • All open files get a filename. See /proc/self/fd/…
  • Network sockets can be opened with a filename see /dev/tcp
    e.g. cat </dev/tcp/towel.blinkenlights.nl/23
ctrl-alt-delor
  • 27,473
  • 9
  • 58
  • 102
  • That last part is only valid on systems with a `/proc` filesystem, and on systems (or shells) that provide a `/dev/tcp` file structure. – Kusalananda Sep 23 '18 at 12:23