1

When I have a long running Bash pipeline of commands, and I often can't see any signs of life due to I/O buffering. I found online that buffering can be disabled using stdbuf. An example shown here is:

tail -f access.log | stdbuf -oL cut -d aq aq -f1 | uniq

However, it is unclear to me which commands in the pipeline need to be prefixed by the stdbuf command. I therefore add it to every command. For no buffering, I might do:

cd ~/tmp
stdbuf -i0 -o0 -e0 find /i \! -type d | \
stdbuf -i0 -o0 -e0 sed -u -n -e \
's=.*\<\(\([A-Z_a-z0-9.-]\+\)/\2/\).*=& \1=p' \
2>&1 | stdbuf -i0 -o0 -e0 tee find.out

This makes my code very noisy in a cognitive sense.

How do I decide which commands need prefixing with stdbuf?

user2153235
  • 379
  • 1
  • 11

1 Answers1

1

The issue with not seeing output "live" is with output buffering of standard output. Input buffering isn't an issue, and standard error is unbuffered by default anyway. So you can drop the -i0 and -e0 options. Disabling input buffering could actually be counterproductive as it might slow the program down.

Then, since the issue is with processing writing to anything but a terminal, the last command of a pipeline usually doesn't buffer its output. (Unless you redirect its output to a file and are looking at that file through another program.)

Then, if a program does have a dedicated option for disabling buffering, like grep --line-buffered in GNU grep, there's no reason to use stdbuf in addition to that. sed -u also means to disable buffering in GNU sed. Also, e.g. tail -f doesn't buffer its output since viewing it live is kinda the point.

So, I would suppose this should do:

stdbuf -o0 find /i \! -type d | \
 sed -u -n -e 's=.*\<\(\([A-Z_a-z0-9.-]\+\)/\2/\).*=& \1=p' 2>&1 | \
 tee find.out

(Though here, I wonder if find would be slow enough for buffering to cause significant issues anyway.)

ilkkachu
  • 133,243
  • 15
  • 236
  • 397
  • Those guidelines are easy to understand and follow. Thank you. Terminal-bound output is likely to be unbuffered (`tail`, `tee`), and `sed`/`grep` have their own buffer disablement. And disablement of input/error stream buffering is unnecessary. – user2153235 Jan 05 '23 at 23:56
  • About `find`'s speed, it is a problem, I suspect due to [network setup](https://unix.stackexchange.com/questions/730280/network-file-responds-very-slowly-to-cygwin-but-not-windows-explorer). I'm not in IT or system administration, so I don't know what the underlying cause is, but any interaction with the network drive is debilitating. It seems to be overhead on a per-file basis rather than the volume of each file. – user2153235 Jan 05 '23 at 23:59