5

I want to be able to pass an argument on the command line to gawk that is evaluated for escape sequences.

The issue:

$ gawk 'BEGIN { print ARGV[1]; }' '\t'
\t

Instead, I would like to get an actual tab character.

From the gawk docs:

The escape sequences in the preceding list are always processed first, for both string constants and regexp constants. This happens very early, as soon as awk reads your program.

How can I interpret character escapes in the command line args?

The end goal is myscript.awk --sep '\t', where separator is a format string, so passing a literal tab isn't an option. I'm also familiar with the easy way I could perform this in bash, but I'm interested in a way to do this in [g]awk.

Anthon
  • 78,313
  • 42
  • 165
  • 222
cdosborn
  • 530
  • 3
  • 15

3 Answers3

2

How can I print the unescaped version of command line args?

print ARGV[1]

The problem is that you don't want the unescaped command line argument. You want to interpret it. You're passing \t (the two-character string backslash, lowercase T), and you want that to be translated to a backslash. You'll need to do this manually. Just translating \t to a tab is easy — gsub(/\\t/, "\t") — but if you want to support octal escapes as well, and remove backslash before a non-recognized character, that's cumbersome in awk.

split ARGV[1], a, "\\";
s = a[1]; delete a[1];
for (x in a) {
    if (skip_next) {
        skip_next = 0;
    } else if (x == "") {
        s = s "\\";
        skip_next = 1;
    } else if (x ~ /^[0-7][0-7][0-7]/) {
        s = s sprintf("%c", 64*substr(x,1,1) + 8*substr(x,2,1) + substr(x,3,1));
        sub(/^.../, x);
    } else if (x ~ /^[0-7][0-7]/) {
        s = s sprintf("%c", 0 + 8*substr(x,1,1) + substr(x,2,1));
        sub(/^../, x);
    } else if (x ~ /^[0-7]/) {
        s = s sprintf("%c", 0 + substr(x,1,1));
        sub(/^./, x);
    } else {
        sub(/^a/, "\a", x) ||
        sub(/^b/, "\b", x) ||
        sub(/^n/, "\n", x) ||
        sub(/^r/, "\r", x) ||
        sub(/^t/, "\t", x) ||
        sub(/^v/, "\v", x);
    }
    s = s x;
}

(Warning: untested code!) Instead of this complex code, you could invoke printf in a subshell. Even that isn't so easy to do when the string could be multiline.

s = ARGV[1]
gsub(/'/, "'\\''", s)
cmd = "printf %b '" s "'."
s = ""
while ((cmd | getline line) > 0) s = s line "\n"
sub(/..$/, "", s)

Note that when you write "\t" in an awk script, that's a string containing the tab character. It's the way the awk syntax is: backslash has a special meaning in a string literal. Note: in a string literal, not in a string. If a string contains a backslash, that's just another character. The source code snippet "\t", consisting of four characters, is an expression whose value is the one-character string containing a tab, in the same way that the source code snippet 2+2, consisting of three characters, is an expression whose value is the number 4.

It would be better for your awk script to take the separator argument as a literal string. That would make it easier to use: your interface requires the caller to escape backslashes in the argument. If you want the separator to be a tab, pass an actual tab character.

Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
Gilles 'SO- stop being evil'
  • 807,993
  • 194
  • 1,674
  • 2,175
  • @StéphaneChazelas, i'm only interested in one line (I should have mentioned). I think `%b` is unnecessary, if you replace it with just `s` then the escapes will be interpreted anyways. – cdosborn Jun 08 '15 at 15:34
  • @cdosborn, well no, `%s` doesn't do any expansion. It's `%b` that does `echo`-like expansion. Also note that `%b` or `echo` expansion is different from the one done by `awk` (or the format argument of `printf`) for octal sequences: you need `\011`, `\11` won't do. – Stéphane Chazelas Jun 08 '15 at 16:04
  • I meant the string `s` in the example. – cdosborn Jun 08 '15 at 16:07
1

First of all, you're not actually passing a tab to your awk. Remember that the shell evaluates the arguments before passing them to awk and '\t' in quotes is evaluated as a literal \ followed by a \t:

$ set -x
$ gawk 'BEGIN { print ARGV[1]; }' '\t'
+ gawk 'BEGIN { print ARGV[1]; }' '\t'
\t

As you can see above, you are not passing a tab to gawk so you can hardly expect it to print one. Compare with the version below which does pass a tab:

$ gawk 'BEGIN { print ARGV[1]; }' "$(printf '\t')"
++ printf '\t'
+ gawk 'BEGIN { print ARGV[1]; }' ' '  ## note the tab
                         ## This line contains a printed tab

Alternatively, you could pass the tab as a variable:

gawk -v t='\t' 'BEGIN {print t}'

Here, the '\t' is being expanded by awk, not the shell, so the tab is interpreted correctly.

terdon
  • 234,489
  • 66
  • 447
  • 667
0

The solution is to use getline.

Inside a file:

BEGIN { 
    sep = ARGV[1]
    gsub(/'/, "'\\''", sep);
    gsub(/%/, "%%", sep);
    "printf -- '" sep "'" | getline sep; 
    printf sep;
}
Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
cdosborn
  • 530
  • 3
  • 15
  • Please read my and terdon's answer carefully. In your case, `\t` inside `awk` was interpreted as normal, it's not `\\t` like when the shell pass to `awk` anymore. – cuonglm Jun 07 '15 at 17:51
  • 1
    Please read my question carefully. My intent was never to make sure `\\t` was passed in, but to evaluate a format string, with the condition that the arg be passed as `--sep '\t'`. Both of your answers ignore the first line of my question. – cdosborn Jun 07 '15 at 18:36
  • `\\t` because you pass `\t` from the command line, the shell treat it as literal, escaped to `\\t` before passing to awk. You need to use `"$(printf '\t')"` or use awk variable instead `awk -v t='\t' 'BEGIN {print t}'`. I don't know why `myscript.awk --sep "$(printf '\t')"` does do what you want? – cuonglm Jun 07 '15 at 18:46
  • Your snippet is broken. Try it with the argument `"a` (i.e. `gawk '…' '"a'`) – Gilles 'SO- stop being evil' Jun 08 '15 at 01:41
  • You're right. I updated the code. The file version which evades the shell escaping works for the cases I tested. I cannot get the other version to work with `"'"`. – cdosborn Jun 08 '15 at 02:15
  • Note that that awk argument ends up being interpreted as shell code (for instance, try with a `"'\$(reboot)'"` argument). Also `getline` only reads one line from the `echo` output, so you can't use that for `'foo\nbar'` for instance. Note that `awk` will expand the escape sequences if you use `awk -v sep='\t' ...` or `awk '...' sep='\t'`. Not all `echo` implementations expand escape sequences, for instance `bash`'s needs a `-e` to perform the expansions. – Stéphane Chazelas Jun 08 '15 at 09:21
  • @StéphaneChazelas, i needed to update my example. On a side note, my previous call to echo, was to `bin/echo` which doesn't actually support `-e`. I forgot the common`echo` is a builtin. – cdosborn Jun 08 '15 at 14:53
  • @StéphaneChazelas, See if you can execute commands on the updated version. It should enforce that its input is within `''`. Credit to @Gilles. – cdosborn Jun 08 '15 at 15:00
  • Try with `%` or `%999999999s`. – Stéphane Chazelas Jun 08 '15 at 15:06
  • When you do `"cmd" | getline` in `awk`, `awk` runs `sh -c cmd`. So it is the builtin `echo` of `sh` you'll get, not `/bin/echo`. – Stéphane Chazelas Jun 08 '15 at 15:08
  • I was wrong again, thanks. It seems like if `%` is handled then `printf` will only interpret escaped chars. – cdosborn Jun 08 '15 at 15:17
  • @StéphaneChazelas: Can you make any comment about my answer? – cuonglm Jun 08 '15 at 17:01