3

Is there a way to perform command substitution inside AWK and be able to reference the fields inside the substituted command using the $n notation of AWK?

E.g.

find | awk '/txt$/ {nl = $(wc -l $NF); print nl}'

I was hoping that the above would print the number of lines in each .txt file. Instead, it effectively returns the same output as:

find | awk '/txt$/ {print}'

Q1: is there a way to perform command substitution inside awk?

Q2: why is the first incantation above silently failing and is simply printing the filenames instead?

Please note the above is offered as an example only. I am not asking how to print the number of lines of each file by some other means. E.g by for f in $(find -iname \*.txt); do wc -l $f; done

The question is specifically about how to leverage command substitution in AWK programs.

Marcus Junius Brutus
  • 4,427
  • 11
  • 43
  • 63
  • Also relevant, because of your `for` loop example: [Why is looping over find's output bad practice?](http://unix.stackexchange.com/q/321697/135943) – Wildcard Jul 27 '17 at 22:02
  • And [Why does my shell script choke on whitespace or other special characters?](http://unix.stackexchange.com/q/131766/135943) for `wc -l $f` without quoting `"$f"`. – Wildcard Jul 27 '17 at 22:02
  • 2
    Not posting this as an answer because it doesn't address command substitution in Awk, but I seriously question whether that is EVER an appropriate solution. Your example `for` loop can be solved with just `find . -type f -name '*txt' -exec wc -l {} +` – Wildcard Jul 27 '17 at 22:46
  • 1
    For this case you could just do `wc -l **/*txt` with globstar on in bash, and similar constructs in some other shells, if the combined filenames don't exceed ARG_MAX. – dave_thompson_085 Jul 28 '17 at 08:10

2 Answers2

4

First, a disclaimer: Please don't parse the output of find. The code below is for illustration only, of how to incorporate command substitution into an Awk script in such a way that the commands can act upon pieces of Awk's input.

To actually do a line count (wc -l) on each file found with find (which is the example use case), just use:

 find . -type f -name '*txt' -exec wc -l {} +

However, to answer your questions as asked:

Q1

To answer your Q1:

Q1: is there a way to perform command substitution inside awk?

Of course there is a way, from man awk :

command | getline [var] Run command piping the output either into $0 or var, as above, and RT.

So ( Watch the quoting !! ):

find . | awk '/txt$/{"wc -l <\"" $NF "\"|cut -f1" | getline(nl); print(nl)}'

Please note that the string built and therefore the command executed is

wc -l <file

To avoid the filename printing of wc.

Well, I avoided a needed file "close" for that command (safe for a couple of files, but technically incorrect). You actually need to do:

find . | awk '/txt$/{
                       comm="wc -l <\"" $NF "\" | cut -f1"
                       comm | getline nl;
                       close (comm);
                       print nl 
                    }'

That works for older awk versions also.
Remember to avoid the printing of a dot . with find ., that makes the code fail as a dot is a directory and wc can not use that.

Or either, avoid the use of dot values:

find . | awk '/txt$/ && $NF!="." {  comm="wc -l <\"" $NF "\" | cut -f1"
                                    comm | getline nl;
                                    close (comm);
                                    print nl 
                                 }'

You can convert that to a one-liner, but it will look quite ugly, Me thinks.

Q2

As for your second question:

Q2: why is the first incantation above silently failing and is simply printing the filenames instead?

Because awk does not parse correctly shell commands. It understand the command as:

nl = $(wc -l $NF)
nl --> variable
$ --> pointer to a field
wc --> variable (that has zero value here)
-  --> minus sign
l  --> variable (that has a null string)
$  --> Pointer to a field
NF --> Last field

Then, l $NF becomes the concatenation of null and the text inside the las field (a name of a file). The expansion of such text as a numeric variable is the numeric value 0

For awk, it becomes:

nl = $( wc -l $NF)
nl = $ ( 0 - 0 )

Which becomes just $0, the whole line input, which is (for the simple find of above) only the file name.

So, all the above will only print the filename (well, technically, the whole line).

Wildcard
  • 35,316
  • 26
  • 130
  • 258
1

Use "weak quotes" rather than 'strong quotes' for subshell expansion to happen within an awk script, but doing so in your example would not be a particularly valuable implementation. It also looks fantastically ugly:

$ awk "END { print \"$(echo hello)\"} " < /dev/null
hello
DopeGhoti
  • 73,792
  • 8
  • 97
  • 133
  • Your example works but fields are not accessible inside the command substitution using the `$n` notation. E.g. `ls | awk "{ print \"$(echo hello $NF)\"} "` fails to print the name of each file following the "hello" – Marcus Junius Brutus Jul 27 '17 at 20:49
  • You need to escape the `$` so that `bash` doesn't consume it before `awk` gets to see it. And even then, the subshell doesn't know it's within `awk`, so you any `awk` parsing would be _after_ any shell processing. So `print \$$(echo 1)` should boil down to `print $1`. – DopeGhoti Jul 27 '17 at 20:52
  • 1
    This has no hope of working. Even if the quotes are done correctly, the shell executes the shell command only once, just before starting awk. There is no way (by quoting) to pass back an awk value to the shell for execution. And the command the OP is asking to get working has a **list** of values as supplied by find, not only one. –  Jul 27 '17 at 21:36