2

I'm trying to run a script that takes a -t argument. This argument stands for text, and the value -- in theory -- is allowed to be multiline. On the command line, I assume a Here Document would work, but I don't like typing out long things on the command line. In addition, I want this file to persist so I can pass it again later.

I'm not sure how to do this; if I cat foo | xargs echo, it prints as one line. This fixes that: cat foo | xargs -d='' echo, but it makes me think there are things I don't understand that will change the whitespace or general structure of the document depending on its contents.

How do I pass a multiline file as an argument without having to worry about special chars or changing its format?

Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
Daniel Kaplan
  • 757
  • 10
  • 25
  • 1
    There are "here documents" but I have never heard of "from documents". It looks like you just need `some-command -t "$(cat foo)"` – muru Aug 17 '21 at 03:25
  • @muru lol, sorry that's what I meant. To me that name is so arbitrary I can never remember it. – Daniel Kaplan Aug 17 '21 at 04:16

2 Answers2

5
  1. Why not just pass it a filename? If the script doesn't already know how to deal with filenames, modify it so that it does (e.g. by treating all unknown arguments as filenames, and/or by adding a -f option for input filenames). Remember to add error-checking code and respond appropriately when a filename doesn't exist, or can't be read due to permissions, etc.

  2. Quote your argument. e.g. -t "$(cat foo)"

cas
  • 1
  • 7
  • 119
  • 185
  • 1. I'd like to, but the script is way over my head. I could modify it for my personal use, but I think I'd have to spend a day learning bash to contribute back to the repo. I'd rather do the latter some day ™ – Daniel Kaplan Aug 17 '21 at 04:24
  • BTW, how do I learn *why* `"$(cat foo)"` is safe for any file contents? – Daniel Kaplan Aug 17 '21 at 04:25
  • 4
    @DanielKaplan *Any* file contents? No. (1) `"$(cat foo)"` removes all newlines from the very end. (2) In Bash you cannot pass NUL bytes this way. But since your script expect *text*, you shouldn't pass NUL bytes (by [definition](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_403) the text file does not contain NUL). (3) `Argument list too long` is possible. // The right way to pass arbitrary content is not via the command line. The script should work with a file: stdin or a named file (the name specified in the command line, environment, config or hardcoded). – Kamil Maciorowski Aug 17 '21 at 04:48
  • @KamilMaciorowski Agreed. I created an issue for that. – Daniel Kaplan Aug 17 '21 at 05:49
2

xargs -d='' (where -d is an extension of the GNU implementation of xargs) is the same as xargs -d= as '' is just quoting syntax in the shell language. That tells xargs to use = as the delimiter.

So for instance echo foo=bar | xargs -d= cmd would call cmd with foo and bar<newline> as arguments.

With xargs -d '' or xargs --delimiter= (which can be abbreviated to xargs --del= and even xargs --d= in current versions of xargs as there's currently no other long-option that starts with d), you'd get a syntax error.

You could use xargs -d '\0', that would be the same as the more portable (though still not standard) xargs -0, which uses the NUL character as delimiter. NUL characters are not meant to appear in text files and anyway can't be passed in an argument to a non-builtin¹ command as the arguments are passed as NUL-delimited strings to the execve() system call.

So:

xargs -I'<TEXT>' -0a file cmd -t '<TEXT>' other args

(-a being another GNU xargs extension), would pass the exact content of file as argument to cmd -t².

But if file contains a foobar line for instance, that would pass the whole line including the line delimiter ("foobar\n").

Alternatively, you could do (in POSIX-like shells):

cmd -t "$(cat file)" other args

Command substitution does strip all the trailing newline characters from the output of the inside command, so would likely be preferable. If the output contains NUL characters, some shells, such as bash remove them (use "$(tr -d '\0' < file)" instead to get that behaviour in any shell).

Note that the double quotes around it are important. Without them, the expansion would be subject to split+glob (split only in zsh) resulting in several arguments if the file contained characters of $IFS (and newline is in the default value of $IFS) or wildcard characters.

In ksh, zsh or bash, you can also use "$(<file)" instead of "$(cat file)" which optimises out the execution of cat by reading the files by themselves (in bash, that's still done in a child process though).

In zsh, you can also use the $mapfile special associative array in the zsh/mapfile module:

zmodload zsh/mapfile
cmd -t "$mapfile[path/to/file]" other args

That's passing the contents as-is, including NULs (which would cause execve() to truncate the arg after the first NUL) and trailing newline characters.

In the rc shell or derivatives, you can do:

cmd -t ``(){cat file} other args

Where ``(sep)cmd is a variant of command substitution (which is `cmd there) where you specify the separator, here none. There's no stripping of trailing newlines in that shell so the whole file contents will be passed as-is.

In any case, note that on most systems, there's a limit on the total size of arguments³ to a command (though on recent versions of Linux, that limit can be raised by changing the stack size limit), and on Linux on the size of a single argument (128KiB max).


Now, to pass a multiline string literally without have to worry about special characters, you can do:

cmd -t 'the multiline here
where the only character you have to
worry about in Bourne-like shells is 
single quote which you have to enter
as '\'', that is leave the quotes, enter
it with other quoting operators (here \)
and resume the quotes'

In the rc shell (where '...' is the only form of quote) or zsh when the rcquotes option is enabled, single quotes can be entered inside single quotes as '', for example: cmd -t 'It''s simpler like that'. See How to use a special character as a normal one in Unix shells? for the details about quoting in various shells.

Or you can use a here-document, either stored in a variable:

multi=$(cat << 'EOF')
multiline string here, only worry would be about
an EOF line by itself though also note that
all trailing newlines, so that includes all
trailing empty lines are removed, including these:


EOF
)
cmd -t "$multi" # note the quotes again

Or directly as:

cmd -t "$(cat << 'EOF'
multi
line
here
EOF
)"

Note the quotes around EOF in those. Without them, parameter expansions (like $var), command substitutions (like $(cmd) or `cmd`) and arithmetic expansions ($((...))) are still performed.

In the mksh shell, you can use:

cmd -t "$(<< 'EOF'
multi
line
EOF
)"

Where the cat and fork are optimised out, making it essentially a multiline form of quotes.


¹ It can't be passed in the arguments of builtin commands or functions, and can't even be stored in variables with most shells as well, zsh being the only exception that I know.

² as long as the file is not empty. If the file is empty, with -I, the command is not run at all, you'd need the file to contain one NUL character for the command to be called with one empty argument.

³ technically, the limit (in the execve() system call again, so does not apply to shell builtins / functions) is on the cumulated size of arguments and environment and generally also takes into account the size of the pointers to each argument and envvar string, so it's generally difficult to predict in advance whether a particular set of arguments will break the limit.

Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501