“Leaky” pipes in linux

Question

Let's assume you have a pipeline like the following:

$ a | b

If b stops processing stdin, after a while the pipe fills up, and writes, from a to its stdout, will block (until either b starts processing again or it dies).

If I wanted to avoid this, I could be tempted to use a bigger pipe (or, more simply, buffer(1)) like so:

$ a | buffer | b

This would simply buy me more time, but in the end a would eventually stop.

What I would love to have (for a very specific scenario that I'm addressing) is to have a "leaky" pipe that, when full, would drop some data (ideally, line-by-line) from the buffer to let a continue processing (as you can probably imagine, the data that flows in the pipe is expendable, i.e. having the data processed by b is less important than having a able to run without blocking).

To sum it up I would love to have something like a bounded, leaky buffer:

$ a | leakybuffer | b

I could probably implement it quite easily in any language, I was just wondering if there's something "ready to use" (or something like a bash one-liner) that I'm missing.

Note: in the examples I'm using regular pipes, but the question equally applies to named pipes

While I awarded the answer below, I also decided to implement the leakybuffer command because the simple solution below had some limitations: https://github.com/CAFxX/leakybuffer

Do named pipes really fill up? I would have thought named pipes *are* the solution to this, but I couldn't say for sure. — Wildcard, Aug 10 '16 at 02:24
Named pipes have (by default) the same capacity as unnamed pipes, AFAIK — CAFxX, Aug 11 '16 at 06:16

Matija Nalis · Accepted Answer · 2018-12-10T04:00:33.517

Easiest way would be to pipe through some program which sets nonblocking output. Here is simple perl oneliner (which you can save as leakybuffer) which does so:

so your a | b becomes:

a | perl -MFcntl -e \
    'fcntl STDOUT,F_SETFL,O_NONBLOCK; while (<STDIN>) { print }' | b

what is does is read the input and write to output (same as cat(1)) but the output is nonblocking - meaning that if write fails, it will return error and lose data, but the process will continue with next line of input as we conveniently ignore the error. Process is kind-of line-buffered as you wanted, but see caveat below.

you can test with for example:

seq 1 500000 | perl -w -MFcntl -e \
    'fcntl STDOUT,F_SETFL,O_NONBLOCK; while (<STDIN>) { print }' | \
    while read a; do echo $a; done > output

you will get output file with lost lines (exact output depends on the speed of your shell etc.) like this:

you see where the shell lost lines after 12773, but also an anomaly - the perl didn't have enough buffer for 12774\n but did for 1277 so it wrote just that -- and so next number 75610 does not start at the beginning of the line, making it little ugly.

That could be improved upon by having perl detect when the write did not succeed completely, and then later try to flush remaining of the line while ignoring new lines coming in, but that would complicate perl script much more, so is left as an exercise for the interested reader :)

Update (for binary files): If you are not processing newline terminated lines (like log files or similar), you need to change command slightly, or perl will consume large amounts of memory (depending how often newline characters appear in your input):

perl -w -MFcntl -e 'fcntl STDOUT,F_SETFL,O_NONBLOCK; while (read STDIN, $_, 4096) { print }'

it will work correctly for binary files too (without consuming extra memory).

Update2 - nicer text file output: Avoiding output buffers (syswrite instead of print):

seq 1 500000 | perl -w -MFcntl -e \
    'fcntl STDOUT,F_SETFL,O_NONBLOCK; while (<STDIN>) { syswrite STDOUT,$_ }' | \
    while read a; do echo $a; done > output

seems to fix problems with "merged lines" for me:

(Note: one can verify on which lines output was cut with: perl -ne '$c++; next if $c==$_; print "$c $_"; $c=$_' output oneliner)

I love the oneliner: I'm no perl expert, if anybody could suggest the improvements above it would be awesome — CAFxX, Aug 15 '16 at 03:24
This seems to work *to some extent*. But as I watch my command which is `perl -w -MFcntl -e 'fcntl STDOUT,F_SETFL,O_WRONLY|O_NONBLOCK; while () { print }' | aplay -t raw -f dat --buffer-size=16000`, perl seems to continually allocate more memory until it's killed by the OOM manager. — Ponkadoodle, Feb 12 '17 at 06:53
@Wallacoloo thanks for pointing that out, my case was streaming log files... See updated answer for slight change needed to support binary files. — Matija Nalis, Feb 13 '17 at 11:05
@StéphaneChazelas for me (*dd 8.26-3 from GNU coreutils on Debian Stretch*), that dies when buffer is first filled; eg. `seq 1 500000 | dd oflag=nonblock status=none | while read a; do echo $a; done` prints only about to `13000` and then no more output is ever sent. — Matija Nalis, Dec 08 '18 at 23:11
You're right, and there seems to be no way around that. `conv=noerror` only works for read error, not write errors. — Stéphane Chazelas, Dec 09 '18 at 16:52
Note that perl will buffer its output, so it's not really _line-based_. You can add a `$| = 1` so it makes one `write()` per line. But it won't help with short writes anyway. For your binary approach, you can also do `$/ = \4096` for `` to read 4096 bytes at a time, and not have to change the code. — Stéphane Chazelas, Dec 09 '18 at 17:03
Sorry, my bad again, actually writes of less than PIPE_BUF bytes (4096 on Linux, required to be at least 512 by POSIX) are guaranteed to be atomic, so `$| = 1` and your `syswrite()` approach do prevent short writes indeed as long as lines are reasonably short. — Stéphane Chazelas, Dec 10 '18 at 06:45

“Leaky” pipes in linux

1 Answers1