4

While I was reading this answer, the author used this command to put the result of a heredoc to a variable:

read -r -d '' VAR <<'EOF'
abc'asdf"
$(dont-execute-this)
foo"bar"''
EOF

I'm a little confused about the -d option. From the help text for the read command:

-d delim
continue until the first character of DELIM is read, rather than newline

So if I pass an empty string to -d, it means read until the first empty string. What does it mean? The author commented under the answer that -d '' means using the NUL string as the delimiter. Is this true (empty string means NUL string)? Why not use something like -d '\0' or -d '\x0' etc.?

Kusalananda
  • 320,670
  • 36
  • 633
  • 936
Fajela Tajkiya
  • 965
  • 5
  • 15
  • 1
    Relevant: [Are the null string and "" the same string?](https://unix.stackexchange.com/q/280430) – terdon May 07 '22 at 12:08
  • 1
    @terdon, they are, but the NUL _byte_ is different (also edited my answer since I only now realized, they said "NUL string" here.) – ilkkachu May 07 '22 at 14:31
  • Think about how strings are stored in memory. Hint: The string `abc` is _four_ bytes, `a` `b` `c` ``. An empty string is _one_ byte, ``. So this isn't really a special case at all in any shell that's written in C and using standard C strings. – Charles Duffy May 07 '22 at 22:17

2 Answers2

10

Mostly, it means what it says, e.g.:

$ read -d . var; echo; echo "read: '$var'"
foo.
read: 'foo'

The reading ends immediately at the ., I didn't hit enter there.

But read -d '' is a bit of a special case, the online reference manual says:

-d delim
The first character of delim is used to terminate the input line, rather than newline. If delim is the empty string, read will terminate a line when it reads a NUL character.

\0 means the NUL byte in printf, so we have e.g.:

$ printf 'foo\0bar\0' | while read -d '' var; do echo "read: '$var'"; done
read: 'foo'
read: 'bar'

In your example, read -d '' is used to prevent the newline from being the delimiter, allowing it to read the multiline string in one go, instead of a line at a time.


I think some older versions of the documentation didn't explicitly mention -d ''. The behaviour may originally be an unintended coincidence from how Bash stores strings in the C way, with that trailing NUL byte. The string foo is stored as foo\0, and the empty string is stored as just \0. So, if the implementation isn't careful to guard against it and only picks the first byte in memory, it'll see \0, NUL, as the first byte of an empty string.

Re-reading the question more closely, you mentioned:

The author commented under the answer that -d '' means using the NUL string as delimiter.

That's not exactly right. The null string (in the POSIX parlance) means the empty string, a string that contains nothing, of length zero. That's not the same as the NUL byte, which is a single byte with binary value zero(*). If you used the empty string as a delimiter, you'd find it practically everywhere, at every possible position. I don't think that's possible in the shell, but e.g. in Perl it's possible to split a string like that, e.g.:

$ perl -le 'print join ":", split "", "foobar";'
f:o:o:b:a:r

read -d '' uses the NUL byte as the separator.

(*not the same as the character 0, of course.)

Why not use something like -d '\0' or -d '\x0' etc.?

Well, that's a good question. As Stéphane commented, originally, ksh93's read -d didn't support read -d '' like that, and changing it to support backslash escapes would have been incompatible with the original. But you can still use read -d $'\0' (and similarly $'\t' for the tab, etc.) if you like it better. Just that behind the scenes, that's the same as -d '', since Bash doesn't support the NUL byte in strings. Zsh does, but it seems to accept both -d '' and -d $'\0'.

ilkkachu
  • 133,243
  • 15
  • 236
  • 397
  • Note that `read -d` initially comes from ksh93 where it initially didn't support `read -d ''` for NUL delimited. Now supported by zsh, bash and mksh. – Stéphane Chazelas May 07 '22 at 14:18
  • Thanks for the answer. I'm on a Ubuntu 20.04 system and the help text doesn't mention what will happen when delim is an empty string. I'll go for the online doc next time. – Fajela Tajkiya May 07 '22 at 17:29
  • I found that your `printf` example only work when piped to `while` like this `printf "ab\0cd\0ef" | while read -d '' a; do echo "[$a]"; done`. When I pipe `printf` to `read` directly, it doesn't work. For example: `printf "ab\0cd\0ef" | read -d '' a; echo "[$a]";`. Is this expected? Seems like a Bash trick, which I'm not aware of. – Fajela Tajkiya May 07 '22 at 17:58
  • 1
    @FajelaTajkiya, that's because the parts of a pipeline run in subshells. With the loop, the whole loop is inside the pipeline, in `... | read; echo`, just the `read` is. See [Why is my variable local in one 'while read' loop, but not in another seemingly similar loop?](https://unix.stackexchange.com/q/9954/170373) – ilkkachu May 07 '22 at 19:15
0

Just to point out the persnickityness of ascii 0 as a character in files. Expect (my favorite tool!) has to make special provisions for reading/matching nulls.

  • Welcome to the site. If you want to comment on another answer, please don't post this as an answer - the answer section is only intended for definitive solutions to the original problem. Once you have enough reputation, you will be able to comment on other people's posts. In the meantime, consider submitting an edit suggestion if you think that you can improve the other answer. – AdminBee Sep 27 '22 at 11:42