What to use instead of `buffer` for 1GB buffer?

Question

buffer(1) seems to be old-ish and have hard-coded values preventing it to cache large amount of data.

$ buffer -m 1G
max_shmem 1 too low
   // it doesn't even understand gigabytes
$ buffer -m 1000M
Cannot handle that many blocks, aborting!
$ buffer -m 1000M -s 1m
blocksize 1048576 out of range

What do use instead?

What problem are you trying to solve? Do you have a performance issue with `buffer`? — roaima, Feb 10 '15 at 22:16
Actually I forgot that Mplayer has it's own adjustable buffer which can span gigabyte and tried to use `buffer` instead. I want to have `buffer` around as more generic solution for postponing processing of some data until enough accumulated. — Vi., Feb 11 '15 at 11:20
If you are writing tar files, consider using star which comes with this functionality built in. — FUZxxl, Feb 18 '18 at 14:36

Vi. · Answer 1 · 2015-10-14T17:05:04.017

Nonstandard move: using socket buffers.

Example:

# echo 2000000000 > /proc/sys/net/core/wmem_max
$ socat -u system:'pv -c -N i /dev/zero',sndbuf=1000000000 - | pv -L 100k -c -N o > /dev/null
        i:  468MB 0:00:16 [ 129kB/s] [  <=>                        ]
        o: 1.56MB 0:00:16 [ 101kB/s] [       <=>                   ]

Implemented two additional tools for this: buffered_pipeline and mapopentounixsocket

$ ./buffered_pipeline ! pv -i 10 -c -N 1 /dev/zero ! $((20*1000*1000)) ! pv -i 10 -L 100k -c -N 2 ! > /dev/zero
        1: 13.4MB 0:00:40 [ 103kB/s] [         <=>      ]
        2: 3.91MB 0:00:40 [ 100kB/s] [         <=>      ]

score -1 · Answer 2 · edited May 01 '19 at 05:19

Answer for Utility to buffer an unbounded amount of data in a pipeline? suggests using pv -B $SIZE. The man page indicates that it can handle larger buffer sizes.

-B BYTES, --buffer-size BYTES

Use a transfer buffer size of BYTES bytes. A suffix of "K", "M", "G", or "T" can be added to denote kibibytes (*1024), mebibytes, and so on. The default buffer size is the block size of the input file's filesystem multiplied by 32 (512 KiB max), or 400 KiB if the block size cannot be determined.

mikeserv · Answer 3 · 2015-12-30T22:27:49.857

-1

INPUT | { 
        mkdir  -p buf &&
        mount  -osize=1g -ttmpfs none buf || exit
        cat     >buf/...
        work_it <buf/...
        umount  buf
} | OUTPUT

For a ring-buffered loop possibly...

INPUT | { 
        mkdir  -p buf &&
        mount  -osize=1g -ttmpfs none buf &&
        while   dd bs=1 count=1 >buf/...  &&
                [ -s buf/... ]
        do      dd obs=64k   | 
                dd  bs=64k count=16383k >>buf/...
                work_it <buf/... 2>&3 
        done    3>&2 2>/dev/null          &&
        umount  buf
} | OUTPUT

edited Dec 30 '15 at 22:27

answered Dec 30 '15 at 21:43

mikeserv

57,448
9
113
229

How is tmpfs protected from overflow? I expect `INPUT` to block and wait if buffer is full. – Vi. Dec 30 '15 at 22:04
@Vi. - That's what the pipe is for. `cat` will fill a file til it is full. Then it will stop. When it stops reading the pipe `INPUT` blocks. – mikeserv Dec 30 '15 at 22:12
What is `work_it`? `INPUT` is producer, `OUTPUT` is consumer. All the rest should be just buffering. – Vi. Dec 30 '15 at 22:37
@Vi. the `buffer` command calls some other program, right? like `buffer my_command | OUTPUT`? I guess *`work_it`* is *`my_command`* - or it might just be `cat` and *`my_command`* will be fed 1g chunks of *`INPUT`* over the pipe and will block while the ring buffer collects it. whatever. i don't care. it's just an example of *how* it might be done. It can sometimes be nice, though, to have all of the advantages of a regular lseekable input file, as would be directly available to *`work_it`* if it were called repeatedly in the loop. however you like is ok w/ me. – mikeserv Dec 30 '15 at 22:40
No, `buffer` just reads input and writes it to output (like `cat`), but also remembers some data if output is slower than input. Compare `INPUT | cat | OUTPUT` (it would also do some buffering, but too little). – Vi. Dec 30 '15 at 22:45
@Vi. `cat` doesn't buffer like that - well... *some* `cat`'s might, but even those should support an `-u`nbuffered switch according to POSIX. but most programs do buffer output - like around 4k or so, sometimes as much as 64k. i would consider it weird for them to do any more - at least linux kernels set the default atomic pipe read/write size at 64k. – mikeserv Dec 30 '15 at 22:48
Yes, `cat` is somewhat buffering because of pipe's buffer in kernel. `INPUT | OUTPUT` is one pipe, `INPUT | cat | OUTPUT` is two pipes, hence twice bigger buffer. I want to have big (gigabyte and more) buffer, that can, for example, house a couple of seconds of uncompressed video. – Vi. Dec 30 '15 at 22:53
@Vi. - that's not how pipes work. the kernel will store only *up to* so much data in the pipe's buffer. but it releases that data as soon as it is read. `cat` reads input blocks as soon as they are available and writes them out in the same size it receives them. – mikeserv Dec 30 '15 at 22:56

What to use instead of `buffer` for 1GB buffer?

3 Answers3

Linked