16

In a shell script...

How do I capture stdin to a variable without stripping any trailing newlines?

Right now I have tried:

var=`cat`
var=`tee`
var=$(tee)

In all cases $var will not have the trailing newline of the input stream. Thanks.

ALSO: If there is no trailing newline in the input, then the solution must not add one.

UPDATE IN LIGHT OF THE ACCEPTED ANSWER:

The final solution that I used in my code is as follows:

function filter() {
    #do lots of sed operations
    #see https://github.com/gistya/expandr for full code
}

GIT_INPUT=`cat; echo x`
FILTERED_OUTPUT=$(printf '%s' "$GIT_INPUT" | filter)
FILTERED_OUTPUT=${FILTERED_OUTPUT%x}
printf '%s' "$FILTERED_OUTPUT"

If you would like to see the full code, please see the github page for expandr, a little open-source git keyword-expansion filter shell script that I developed for information security purposes. According to rules set up in .gitattributes files (which can be branch-specific) and git config, git pipes each file through the expandr.sh shell script whenever checking it in or out of the repository. (That is why it was critical to preserve any trailing newlines, or lack thereof.) This lets you cleanse sensitive information, and swap in different sets of environment-specific values for test, staging, and live branches.

CommaToast
  • 375
  • 1
  • 2
  • 10
  • what you do here is not necessary. `filter` takes `stdin` - it runs `sed`. You catch `stdin` in `$GIT_INPUT` then print that back to `stdout` over a pipe to `filter` and catch its `stdout` in `$FILTERED_OUTPUT` and then print it back to `stdout`. All 4 lines at the bottom of your example above could be replaced with just this: `filter`. No offense meant here, it's just... you're working too hard. You don't need the shell variables most of the time - just direct the input to the right place and pass it on. – mikeserv Sep 10 '14 at 17:30
  • No, what I do here *is necessary* because if I just do `filter`, then it will add newline characters to the ends of any input streams that did not end in newlines initially. In fact I originally did just do `filter` but ran into that problem which led me to this solution because neither "always add newlines" nor "always strip newlines" are acceptable solutions. – CommaToast Sep 10 '14 at 17:52
  • `sed` probably will do the extra newline - but you should handle that in `filter` not with all the rest. And all of those functions that you have basically do the same thing - a `sed s///`. You're using the shell to pipe data it has saved in its memory to `sed` so that `sed` might replace that data with other data that the shell has stored in its memory so `sed` can pipe it back to the shell. Why not just `[ "$var" = "$condition" ] && var=new_value`? I also don't get the arrays - are you storing the array name in `[0]` then using `sed` to replace that with the value in `[1]`? Maybe chat? – mikeserv Sep 10 '14 at 18:27
  • @mikeserv - What would be the benefit of moving that code inside `filter`? It works perfectly as-is. Regarding how the code at my link works and why I set it up the way that I did, yeah, lets talk about it in a chat room. – CommaToast Sep 10 '14 at 20:52

4 Answers4

10

The trailing newlines are stripped before the value is stored in the variable. You may want to do something like:

var=`cat; echo x`

and use ${var%x} instead of $var. For instance:

printf "%s" "${var%x}"

Note that this solves the trailing newlines issue, but not the null byte one (if standard input is not text), since according to POSIX command substitution:

If the output contains any null bytes, the behavior is unspecified.

But shell implementations may preserve null bytes.

vinc17
  • 11,912
  • 38
  • 45
  • Would text files typically contain null bytes? I can't see why they would. But the script that you just mentioned does not seem work. – CommaToast Sep 09 '14 at 00:51
  • @CommaToast Text files don't contain null bytes. But the question just says stdin / input stream, which may not be text in the most general case. – vinc17 Sep 09 '14 at 00:53
  • OK. Well I tried it from the command line and it didn't do anything, and from within my script itself, your suggestion fails because it adds "..." at the end of the file. Also if there was no newline there, then it still adds one. – CommaToast Sep 09 '14 at 00:59
  • @CommaToast The "..." was just an example. I've clarified my answer. No newline is added (see the text **before** the "..." in the example). – vinc17 Sep 09 '14 at 01:06
  • Ok, so am I understanding this correctly: you add an "x" so it no longer has a trailing newline but a trailing "x" instead, then you remove the trailing "x" when you want to use the var? If so then couldn't you do `var=${var%x}` after the first line and that way, just deal with $var thenceforth? – CommaToast Sep 09 '14 at 01:15
  • @CommaToast Yes, you can do `var=${var%x}`, and then use `$var`. It's better if you use `$var` several times, otherwise it is rather useless since you can use `${var%x}` directly. – vinc17 Sep 09 '14 at 01:17
  • @CommaToast And with zsh, you can do `var=${$(your_command; echo x)%x}` directly, but this doesn't work with other shells. – vinc17 Sep 09 '14 at 01:21
  • Why do you suppose they strip 0x0a bytes off the end of streams like that? It seems rather rude. And I thought ISIS was barbaric! BTW I think you should take the "..." out of your answer; it seems really irrelevant and unhelpful. – CommaToast Sep 09 '14 at 01:25
  • @CommaToast I've removed the `...\n` as you suggested, but note that some shells hide the last line if it doesn't end with a newline character. So, people need to test with zsh or similar. – vinc17 Sep 09 '14 at 01:42
  • @CommaToast - Using a tool like `sed` which can be used to append only a single byte which will not append anything but a single byte is something that can be relied upon in any shell. You also don't need to do the two commands. vinc17's answer here would be far more portable if `printf` were used in place of `echo` - with `printf` you can do `printf x` - and no newline is appended. So, in `zsh` and similar shells that preserve newlines in a command substitution subshell the behavior is identical - there are no nasty surprises. – mikeserv Sep 09 '14 at 02:09
  • 1
    Well, shells shouldn't hide things, that is not cool. Those shells ought to be fired. I don't like it when my computer thinks it knows better than me. – CommaToast Sep 09 '14 at 02:14
  • @mikeserv No, zsh does *not* preserve newlines in command substitution: `var=$(echo foo); printf "%s...\n" "$var"` doesn't output a newline between "foo" and "...". – vinc17 Sep 09 '14 at 02:19
  • I have realized that you should do var=`cat; printf x`, not `echo x`, because `echo` adds an extra newline. – CommaToast Sep 09 '14 at 02:38
  • @CommaToast This is equivalent: `echo x` adds a newline, but it immediately gets stripped in command substitution. `echo` was just faster to type than `printf`. :) Also that since `echo` is simpler than `printf`, it is more likely to be a builtin. Just in case... – vinc17 Sep 09 '14 at 02:40
  • Yep you're right. Actually the way I did it had weird effects to. Echo was better than printf in this situation :D – CommaToast Sep 09 '14 at 03:33
  • @vinc17 You can see the full code I'm using this for here: https://github.com/gistya/expandr It's a script that acts as a keyword expansion filter for git. Sed is used extensively. If the newlines were frakked with by the script then needless diffs would be generated. Thanks for your help. – CommaToast Sep 09 '14 at 07:13
  • @vinc17 Even if `echo` is a built-in and `printf` is not, the command substitution would force a subshell to be started to execute the built-in, so there's no efficiency gain there. – chepner Sep 09 '14 at 21:38
  • @chepner This depends on the shell. With mksh, there are 2 clones with `printf` while there's only one with `echo`. Compare `strace -f -o out mksh -c 'var=$(echo a; printf x)'` and `strace -f -o out mksh -c 'var=$(echo a; echo x)'`. – vinc17 Sep 09 '14 at 23:33
  • I realize I should've just used CLI PHP or Ruby, LOL, but I was too far into the project to turn back from shell scripting. Lesson learned. – CommaToast Sep 10 '14 at 05:01
  • @vinc17 I thought you were implying that `$(echo)` would be faster than `$(printf)` if only the first used a built-in. – chepner Sep 10 '14 at 11:44
  • @vinc17 - this is because `mksh` doesn't have a `printf` - in that shell you call whatever is the `printf` binary in path or you don't call one at all unless you do what mirabilos describes as some *ugly hacks* at build time to compile in a `printf` builtin. You can use - and likely should in `ksh` variants - the `print` builtin. With `echo` you get a clone for the subshell, but with `printf` you have to clone twice - once for the subshell and once for the command execution environment when it is `exec`ed. – mikeserv Sep 10 '14 at 19:02
  • @mikeserv Yes, I gave this example based on what I said above: `echo` is more likely to be a builtin than `printf`, and mksh is an example. But note that there is another important point in case where the *last* command is *not* a builtin: like bash, mksh does a useless `clone`. Both dash and zsh optimize by avoiding a `clone` call. This can be seen with strace after replacing `printf` by `/usr/bin/printf` (to test the case where the last command is not a builtin). If mksh had this optimization, `printf` would be equivalent to `echo` concerning `clone` calls. – vinc17 Sep 10 '14 at 19:28
  • @vinc17 - I'm not sure I'm following, but I just did this: `for sh in da ba z; do strace -c ${sh}sh -c 'echo | /usr/bin/true'; done 2>&1 | grep clone`. I get `2 2 2`. – mikeserv Sep 10 '14 at 19:47
  • @mikeserv You forgot the `-f` option, and the command is not the correct one (see above with `var=...`). Try: `for sh in da ba z; do strace -f -c ${sh}sh -c 'var=$(echo ; /usr/bin/true)'; done 2>&1 | grep clone` (but on Debian, this is `/bin/true`). – vinc17 Sep 10 '14 at 19:59
  • @vinc17 - you said nothing about a variable, just that a last pipe command got useless `clone`. But you're right about follow - I'm pretty weak with `strace`. I'll try it. In any case - can you please remove all of the incorrect stuff from the other post now? I answered your question, will you now answer mine? And by the way... `-f` makes no difference. All `2`'s. – mikeserv Sep 10 '14 at 20:28
  • @vinc17 - with your thing - in the subshell - there is an additional clone as you say for `bash` and `mksh`. Why that should be I don't know. I'm going to look at the actual calls because I am intrigued - thanks. – mikeserv Sep 10 '14 at 20:35
  • what is `cat` printing above? there is no reference to any input whatsoever. does it `cat` a pipe? do you serve it an input file - through command substitution? is it, perhaps, reading nothing? in most cases the results your command will render are and empty variable. – mikeserv Sep 11 '14 at 04:55
  • Concerning `cat`, the answer is at the beginning of the OP's question: it's inside a shell script, and the OP is interested in capturing the standard input. – vinc17 Sep 11 '14 at 12:07
4

You can use the read built-in to accomplish this:

$ IFS='' read -d '' -r foo < <(echo bar)

$ echo "<$foo>"
<bar
>

For a script to read STDIN, it'd simply be:

IFS='' read -d '' -r foo

 

I'm not sure what shells this will work in though. But works fine in both bash and zsh.

phemmer
  • 70,657
  • 19
  • 188
  • 223
  • 1
    Neither `-d` nor the process substitution (`<(...)`) are portable; this code will not work in `dash`, for instance. – chepner Sep 09 '14 at 21:24
  • Well the process substitution isn't part of the answer, that was only part of the example showing that it works. As for `-d`, that's why I put the disclaimer at the bottom. The OP doesn't specify the shell. – phemmer Sep 09 '14 at 22:38
  • @chepner - while the style differs slightly, the concept certainly does work in `dash`. You just use `< – mikeserv Sep 10 '14 at 19:30
  • You set IFS='' so it doesn't put spaces in between the lines it reads in eh? Cool trick. – CommaToast Sep 11 '14 at 01:05
  • Actually in this case `IFS=''` probably isn't necessary. It's meant so that `read` won't collapse spaces. But when it's reading into a single variable, it has no effect (that I can recall). But I just feel safer leaving it on :-) – phemmer Sep 11 '14 at 02:20
2

You can do like:

input | { var=$(sed '$s/$/./'); var=${var%.}; }

Whatever you do $var disappears as soon as you step outside of that { current shell ; } grouping anyway. But it could also work like:

var=$(input | sed '$s/$/./'); var=${var%.}
mikeserv
  • 57,448
  • 9
  • 113
  • 229
  • 1
    It should be noted that with the first solution, i.e. having to use `$var` in the `{ ... }` grouping, is not always possible. For instance if this command is run inside a loop and one needs `$var` outside the loop. – vinc17 Sep 09 '14 at 01:57
  • @vinc17 - if it is a loop I desired to use, then I would use it in place of the `{}` braces .It is true - *and is explicitly noted in the answer* - that the value for `$var` is *very likely* to disappear entirely when the `{ current shell; }` grouping is closed. Is there some *more* explicit way to say it than, *Whatever you do `$var` disappears...?* – mikeserv Sep 09 '14 at 02:01
  • @vinc17 - probably the best way, though: `input | sed "s/'"'/&"&"&/g;s/.*/process2 '"'-> &'/" | sh` – mikeserv Sep 09 '14 at 02:25
  • @vinc17 - a subshell doens't mean you can't get the variable's value - though it can be harder to do reliably. [This three-line function](http://gdriv.es/mikeserv/scripts/sq.txt) does pretty well. Though I've just realized it needs a `${a:-continue}` or something. You can use it like `sq *` to pipe out shell-quoted whatever. Anyway, you still haven't answered the question - how can I be more explicit than *Whatever you do...?* – mikeserv Sep 09 '14 at 02:44
  • @vinc17 - answer to what? And it wasn't modified - except that `while read ...` might have mauled it and you added a `->`. You mean like `(input;printf .)|(var=$(sed -n '$p'); process2 "${var%.}")`; still - I wouldn't do it that way. It's inefficient. Just `input| sed "\$!d;s/'"'/&"&"&/g;s/.*/process2 '"'&'/" | sh` – mikeserv Sep 09 '14 at 02:59
  • @vinc17 - This is awful - that is *not* the question. Believe me - had I a reason to do this thing, I *could* and I *would* - and I would do it without the `while read` loop mauling my input. Do you know what that does to backslashes? I'm [quite](http://unix.stackexchange.com/a/154441/52934) [creative](http://unix.stackexchange.com/a/151057/52934). In any case, the *actual* question - is how can I get any more explicit than what I have already stated? – mikeserv Sep 09 '14 at 03:36
  • @vinc17 - what ***are*** you on about? Ive already stated *repeatedly* that the variable's value is lost - *Whatever you do...* I did a similar thing though - as is linked above - w/ file descriptors, here-documents, and input aggregation just this morning. I dont know what your `fct[1-4]` do - and it maybe youre right - but it doesnt answer the very simple question that i have already repeatedly asked you. Why do you keep on? – mikeserv Sep 09 '14 at 08:14
  • @vinc17 - `while fct1; do fct2 | fct3; done | fct4`. I would not write shell functions that cannot work together. I don't understand why `$var` is involved at all - *what for*? If the shell function doesn't explicitly set the value of the variable in the current shell because instead the shell function must run some outside program, then you just pipe it out - why capture `stdin` if your only purpose is to pass it on? Just pass it. Else, have those functions work in concert in the current shell. Your problem is not an input problem it is a design problem. Your code makes it cumbersome. – mikeserv Sep 10 '14 at 19:18
  • Let's take an example based on a real-world sh script: `autoconf` (version 2.69). One has `arg=` with command substitution in a `case` construct, where `$arg` is used outside the `case`. Let's say that one would want to keep the trailing newlines from the `sed`. With the pipeline solution (the first one), it is not obvious to update the script to get the wanted behavior (it actually appears to be impossible to me without major changes, but I would be interested in seeing a solution if there is one). – vinc17 Sep 10 '14 at 21:03
  • @vinc17 - *no*. `autoconf` is *awful*. It is written by people that don't understand shell. And it doesn't matter - you *still have not answered the question.* And the `sed` thing is *really* easy: sed '...$s/$/./'` - see? You do it *with `sed`*. And *why* are you doing command subs in a case statement? – mikeserv Sep 10 '14 at 21:06
  • 1
    There's also the `_variables` function of `bash_completion`, which stores the result of a command substitution in a global variable `COMPREPLY`. If a pipeline solution were used to keep newlines, the result would be lost. In your answer, one has the impression that both solutions are equally good. Moreover it should be noted that the pipeline solution behavior heavily depends on the shell: a user could test `echo foo | { var=$(sed '$s/$/./'); var=${var%.}; } ; echo $var` with ksh93 and zsh, and thinks that it is OK, while this code is buggy. – vinc17 Sep 10 '14 at 21:36
  • @vinc17 - that code *is* buggy - which is why I *SAID SPECIFICALLY* that it doesn't work... How can I be *more* explicit? I could say - Whatever you do, in a *POSIX* shell, I guess. And yes - both solutions are *equally good* - or, rather, *equally bad*. – mikeserv Sep 10 '14 at 21:58
  • 1
    You did not say "it doesn't work". You just said "`$var` disappears" (which is actually not true since this depends on the shell — the behavior is unspecified by POSIX), which is a rather neutral sentence. The second solution is better because it doesn't suffer from this problem, and its behavior is consistent in all POSIX shells. – vinc17 Sep 10 '14 at 22:09
  • @vinc17 - unspecified means it disappears. There is no guarantee. It is Schrodingers cat. Unspecified is not neutral when it comes to a specification. Unspecified is the opposite of a specification. It is anathema. – mikeserv Sep 10 '14 at 22:25
  • No, it is just unspecified by POSIX, which allows two possible valid behaviors, depending on whether the last command of the pipeline runs in a subshell or not. On the other hand, implementations have their own specification. With zsh, the last command of a pipeline is guaranteed to run in the current shell. With ksh93, the last command may run in a subshell or not, so that there are two possible behaviors. – vinc17 Sep 11 '14 at 01:32
0

I cannot speak to the portability nor robustness of this solution but it does what I expect:

var=$(</dev/stdin)
Luis
  • 101