6

Is there any difference between these three code blocks in bash?

Using IFS= :

#!/usr/bin/env bash
while IFS= read -r item; do
    echo "[$item]"
done </dev/stdin

Using IFS=$'\n':

#!/usr/bin/env bash
while IFS=$'\n' read -r item; do
    echo "[$item]"
done </dev/stdin

Using -d $'\n':

#!/usr/bin/env bash
while read -rd $'\n' item; do
    echo "[$item]"
done </dev/stdin

If there are differences between the two IFS values and the -d deliminator alternative, then under which circumstances would the differences present themselves?

From my testing, they all appear the same:

echo $'one two\nthree\tfour' | test-stdin 
# outputs:
# [one two]
# [three    four]
balupton
  • 461
  • 1
  • 4
  • 16
  • Explained under "Word Splitting" in `man bash`. – choroba Nov 10 '21 at 09:57
  • 2
    Relevant URL from @choroba's comment: https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html#Word-Splitting If some code samples could be provided, that illustrate what the documentation is trying to communicate, that would be appreciated. For foolish ol' me, the documentation is too obtuse without illustration. – balupton Nov 10 '21 at 10:06

2 Answers2

11

IFS= and IFS=$'\n' are identical when it comes to read (assuming the read delimiter is not changed from the default), since the only difference is whether a newline inside a line separates words, but a newline never appears inside a line.

read and read -d $'\n' are identical since $'\n' (newline) is the default delimiter.

IFS= and IFS=$'\n' makes a difference for field splitting: IFS= completely turns off field splitting, whereas IFS=$'\n' splits on newlines.

IFS=$'\n'
echo $(echo a; echo b)
# prints "a b" on a single line since $'a\nb' is split at 
# the newline and therefore echo receives two arguments "a" and "b"
IFS=
echo $(echo a; echo b)
# prints "a" and "b" on separate lines $'a\nb' is passed 
# as a single argument to echo
Jim L.
  • 7,188
  • 1
  • 13
  • 25
Gilles 'SO- stop being evil'
  • 807,993
  • 194
  • 1,674
  • 2,175
  • Note that `IFS=$'\n'` can cause trouble if the script runs under a shell that doesn't support the `$' '` ANSI-C quoting mode. It'll work fine under bash, zsh, ksh, etc, but if the script is run under dash or something like that, `IFS` will consist of the dollar sign, backslash, and "n" characters, which can cause really weird effects. – Gordon Davisson Nov 10 '21 at 19:18
3

Combining the excellent mentioned resources from @Giles's answer, @choroba's comment, and the answers from another question. I've put together the following code examples to illustrate the differences:


IFS (aka Internal Field Separator) specifies the inline delimiters (multiple characters are accepted, order is irrelevant). It defaults to IFS=$' \t\n'. It is only relevant if read is given multiple variable targets.

read's -d argument specifies the line delimiter (only the first character is accepted). It defaults to -d $'\n'.

As such,

# IFS=, -d $'\n', with tab separated fields, across two lines
echo $'a\tb\tc\nz\tx\ty' | while IFS= read -rd $'\t' a b c; do echo "[$a] [$b] [$c]"; done
# [a] [] []
# [b] [] []
# [c
# z] [] []
# [x] [] []

# IFS=tab, with tab separated fields, across two lines
echo $'a\tb\tc\nz\tx\ty' | while IFS=$'\t' read -r a b c; do echo "[$a] [$b] [$c]"; done
# [a] [b] [c]
# [z] [x] [y]

# IFS=tab, with tab separated fields, across two lines, with only a single variable target
echo $'a\tb\tc\nz\tx\ty' | while IFS=$'\t' read -r a; do echo "[$a]"; done
# [a    b   c]
# [z    x   y]

# IFS=tab, with space and tab separated fields, across two lines
echo $'a b\tc\nz\tx y' | while IFS=$'\t' read -r a b c; do echo "[$a] [$b] [$c]"; done
# [a b] [c] []
# [z] [x y] []

# IFS=tab+space, with space and tab separated fields, across two lines
echo $'a b\tc\nz\tx y' | while IFS=$'\t ' read -r a b c; do echo "[$a] [$b] [$c]"; done
# [a] [b] [c]
# [z] [x] [y]

# IFS=newline, -d '', with space and tab separated fields, across two lines
echo $'a b\tc\nz\tx y' | while IFS=$'\n' read -rd '' a b c; do echo "[$a] [$b] [$c]"; done
# outputs nothing, as no delimiter means no lines for inline splitting

# IFS=newline, -d '', with space and tab separated fields, across two lines, with trailing null character
printf 'a b\tc\nz\tx y\0' | while IFS=$'\n' read -rd '' a b c; do echo "[$a] [$b] [$c]"; done
# outputs a single line, with two newline separated fields:
# [a b  c] [z   x y] []

# IFS=newline, -d $'\0', with space and tab separated fields, across two lines, with trailing null character
printf 'a b\tc\nz\tx y\0' | while IFS=$'\n' read -rd $'\0' a b c; do echo "[$a] [$b] [$c]"; done
# outputs a single line, with two newline separated fields:
# [a b  c] [z   x y] []

As such,

  • IFS splits "fields" across a "line", it is an "inline" splitter
  • -d splits "lines", it is a "line" splitter
  • customise IFS to customise what separates "fields"
  • customise -d to customise what separates "lines"

One use case where -d is valuable, is reading each field individually, in a specific order:

echo $'a b\tc\nz\tx y' | {
    read -rd ' ' a
    echo "a=[$a]"
    read -rd $'\t' b
    echo "b=[$b]"
    read -rd $'\n' c
    echo "c=[$c]"
    read -rd $'\t' z
    echo "z=[$z]"
    read -rd $' ' x
    echo "x=[$x]"
    read -rd $'\n' y
    echo "y=[$y]"
}
# a=[a]
# b=[b]
# c=[c]
# z=[z]
# x=[x]
# y=[y]

As such,

  • IFS is only necessary to be defined iff your read call accepts multiple variable targets.
  • If your read call only accepts a single variable argument, IFS is discarded, which means that IFS= in such cases only serves a cosmetic function.

@Giles's answer covers IFS outside the context of read.

Such a use case could be selecting a filename from a directory that contains two files, one with a space inside it, and one without:

cd "$(mktemp -d)" || exit 1
touch 'before-space after-space.txt'
touch 'no-space.txt'

# using arrays
# results in correct fields for selection
mapfile -t list < <(ls -1)
select node in "${list[@]}"; do
    echo "via mapfile, [$node]"
    break
done
echo
# outputs:
# 1) before-space after-space.txt
# 2) no-space.txt
# #? 1
# via mapfile, [before-space after-space.txt]

# using word splitting with default `IFS`
# results in mangled fields for selection
select node in $(ls -1); do
    echo "IFS=default [$node]"
    break
done
echo
# outputs:
# 1) before-space
# 2) after-space.txt
# 3) no-space.txt
# #? 1
# IFS=default [before-space]

# using word splitting with `IFS=$'\n'`
# results in the correct fields for selection
IFS=$'\n'
select node in $(ls -1); do
    echo "IFS=newline [$node]"
    break
done
echo
# outputs:
# 1) before-space after-space.txt
# 2) no-space.txt
# #? 1
# IFS=newline [before-space after-space.txt]

# using word splitting with `IFS=`
# results in a jumbled field for selection
IFS=
select node in $(ls -1); do
    echo "IFS= [$node]"
    break
done
echo
# outputs:
# 1) before-space after-space.txt
# no-space.txt
# #? 1
# IFS= [before-space after-space.txt
# no-space.txt]
balupton
  • 461
  • 1
  • 4
  • 16
  • 1
    in that last loop, you're right that `read` will exit with a failure since it doesn't see the NUL delimiter. But you could add that to the input by using `printf 'a b\tc\nz\tx y\0'` instead of the `echo` (Can't use `$'\0'` since most shells can't handle the NUL in an expansion. But within `printf` it works.) Or, if you ignore the return value of `read`, by using e.g. `echo $'a b\tc\nz\tx y' | ( IFS=$'\n' read -rd '' a b c; echo "[$a] [$b] [$c]" )` instead of the `while` loop. Either gives `[a b c] [z x y] []`, showing how the input is split on the newlines. – ilkkachu Nov 10 '21 at 11:30
  • @ilkkachu thank you! I've incorporated your feedback into the examples. – balupton Nov 10 '21 at 12:04
  • 2
    AFAIU, `IFS` can affect leading/trailing chars (even if only reading into a single variable). e.g. `read <<<" foo " -r var && echo "<$var>"` prints `` – rowboat Nov 10 '21 at 12:10
  • 1
    @rowboat, yes, for whitespace separators (, , ). But non-whitespace separators don't get that, `IFS=: read <<<"::foo::" -r var && echo "<$var>"` prints `<::foo::>`. – ilkkachu Nov 10 '21 at 12:23
  • 1
    `read -rd $'\0'` is exactly the same as `read -rd ''`. The `\0` in `$''` does create a NUL byte, but the very next moment Bash takes that NUL as ending the string, since that's how C-style strings work. I think I've heard that `read -d ''` using the NUL as delimiter was something of an accident to begin with, the implementation just used the first byte, which in that case was the terminating NUL. You can see the same with e.g. `echo $'foo\0bar'` is the same as `echo 'foo'` since the NUL terminates the string. – ilkkachu Nov 10 '21 at 12:27
  • The whitespace vs non-whitespace trimming is quite peculiar. – balupton Nov 10 '21 at 13:38
  • The whitepace-trimming effect is the reason that many people use `IFS= read -r ...` as the standard "basic" `read` command (with `-r` to avoid backslash weirdness). It's the "just give me what you read, don't mess with it" invocation. – Gordon Davisson Nov 10 '21 at 19:14
  • @ilkkachu Even a non-whitespace char as `IFS` can affect trimming, as shown in [Stéphane's answer](https://unix.stackexchange.com/a/209184/402371) – rowboat Nov 11 '21 at 01:44
  • @rowboat, oh, right... `IFS=: read <<< "foo:" -r var && echo "<$var>"` gives ``. sigh. – ilkkachu Nov 11 '21 at 06:29