How to merge multiple piped awk commands into a single awk command

Question

I am writing a script to filter a file that has contents like

a:10
b:20
c:60
# comment
{{# random mustache templating}}
d=4
e=6

to get the output which would look like

a
b
c
d
e

Here is my command

cat filename.txt | awk '{$1=$1;print}' | awk -F'{{' '{print $1}' | awk -F'=' '{print $1}' | awk -F':' '{print $1}' | awk -F'#' '{print $1}' | awk /./

Purpose:

Remove anything in a line from the occurrence of characters '=' or ':'.
Remove the line that starts with '{{' to remove templating.
Trim whitespaces at the beginning and end of each line.
Remove all blank lines.

As I am new to bash, how can I make this command shorter?

Is there a reason to have so many rules sequentially? Perhaps you could write a regex? — Harrys Kavan, Nov 18 '20 at 11:53

Stephen Kitt · Accepted Answer · 2020-11-18T10:54:24.027

2

The field separator can be a full regex, so

awk -F'[:#=]' '!/^{{/ && length($1) > 0 { split($1, a, " "); print a[1] }' filename.txt

is sufficient: any one of ‘:’, ‘#’, ‘=’ will act as a separator. We exclude lines starting with “{{”, match lines where $1 is non-empty, split $1 on whitespace, and print the first resulting field.

edited Nov 18 '20 at 10:54

answered Nov 18 '20 at 10:20

Stephen Kitt

411,918
54
1,065
1,164

4

For what it's worth, I agree with @AdminBee that inline code markup is cleaner and more legible than pretty-quotes. – terdon Nov 18 '20 at 11:13
I suppose it can be blinding to see `print $1` four times in one command, but the OP never *says* that they want only the first “word” from the input line. Arguably, an input of `⁠   foo    bar   ` should produce an output of `foo bar`, not just ``foo``. P.S. I see now that that the first version of your answer got this “right”; it’s not clear to me why you changed it. – G-Man Says 'Reinstate Monica' Nov 19 '22 at 20:21
@G-Man I don’t remember now, it’s possible there were clarifying comments that have been deleted since. The original commands in the question only kept the first word, and since the question asks for a simplification of those commands, it seems reasonable to keep that behaviour even though it’s not identified explicitly. – Stephen Kitt Nov 20 '22 at 16:21
“The original commands in the question only kept the first word…” Um, where do you see that? Therein lies the point of my comment; the pipeline in the question says `print $1` four times, and *each one is with a field separator other than whitespace.* So it keeps the “first word” in the sense that `foo bar` is the first word of `⁠   foo    bar:42`. … … … … … … … … … … … … … … … No worries; I have plenty of answers where I cannot reconstruct what I was thinking when I wrote them. – G-Man Says 'Reinstate Monica' Nov 24 '22 at 03:01
@G-Man ah yes, I am indeed blind… The answer timeline shows it being accepted after the relevant edit, so presumably there was a reason for it. – Stephen Kitt Nov 24 '22 at 05:32

Ed Morton · Answer 2 · 2020-11-19T00:21:43.027

1

Keep it simple:

$ awk 'NF && ($1 !~ /^(#|\{+)/) { sub(/[:=].*/,""); print $1 }' file
a
b
c
d
e

edited Nov 19 '20 at 00:21

answered Nov 19 '20 at 00:15

Ed Morton

28,789
5
20
47

(1) I suppose it can be blinding to see `print $1` four times in one command, but the OP never *says* that they want only the first “word” from the input line. Arguably, an input of `⁠   foo    bar   ` should produce an output of `foo bar`, not just `foo`. (2) The question is a bit non-specific, but it does explicitly say “Remove all blank lines”, and your code outputs a blank line for an input line that begins with `:` or  `=`. … (Cont’d) – G-Man Says 'Reinstate Monica' Nov 19 '22 at 20:21
(Cont’d) … (3) The OP doesn’t say what they want done with `#`, but their working code removes the first `#` and everything beyond it, so `foo#bar` becomes `foo`. Your code passes `foo#bar` through unchanged. – G-Man Says 'Reinstate Monica' Nov 19 '22 at 20:21

Bumbling Badger · Answer 3 · 2020-11-19T03:43:58.647

To achieve the result above, I just used regex for the field separator, regex to select the lines and {print $1} to print the first column.

I see no leading whitespace or blank lines in your example, but if you need to deal with these, see my variations to this command below.

awk -F'[:=]' '!/^[#{]/{print $1}' filename.txt

Result:

a
b
c
d
e

If you have whitespace leading or trailing, the following may work. Though, I will admit, without seeing an example it is tricky for me to visualise.

awk -F'[:=]' '{gsub(/^\s+|\s+$/,"",$1)} !/^[#{]/{print $1}' filename.txt

To cover every possible case, based on your comments, I have adapted the example. Now, we have leading and trailing whitespace and empty lines.

a:10
b :20
  c:60
# comment

 {{# random mustache templating}}
d=4
e =6

This is the slightly altered command to deal with this:

awk -F'[:=]' '{gsub(/^\s+|\s+$/,"",$1)} !/^[#{]/ && !/^$/{print $1}' filename.txt

The field separator regex separates the first field $1 from everything which comes after : or =
gsub removes all leading and trailing spaces
The regex before {print $1} removes all lines starting with a # or { to exclude comments, 'templating' and blank lines.

This produces the following result from the adapted example:

a
b
c
d
e

@RishabhBohra The current accepted answer doesn't deal with leading space before the ```{{# random mustache templating}}``` in my adapted example. Instead, it leaves behind ```{{```. However, it really depends on what you are looking for? — Bumbling Badger, Nov 19 '20 at 03:57
(1) As you know, the question says “Remove all blank lines”, but your code outputs a blank line for an input line that begins with `:` or  `=`. (2) The OP doesn’t say what they want done with ``#``, but their working code removes the first `#` and everything beyond it, so `foo#bar` becomes `foo`. Your code passes `foo#bar` (and even `foo #bar`) through unchanged. — G-Man Says 'Reinstate Monica', Nov 19 '22 at 20:23

score 0 · Answer 4 · answered Nov 19 '20 at 05:52

0

Using sed:

sed -E '{ s/\s*([^:=]*).*/\1/ }; /^(\{\{|#|$)/d' infile

Swipe the order of the commands above to sed -E '/.../d; { ... }', if you also want to keep those lines that not started immediately with {{ or # characters but whitespaces.

answered Nov 19 '20 at 05:52

αғsнιη

40,939
15
71
114

codeholic24 · Answer 5 · 2020-11-19T05:24:25.280

-1

May be this will help you to achieve the expected result

#!/bin/bash

dynamic_array=()

while read -r line 
do 
    var=$(echo "$line" | cut -c 1)    
    if ! { [ "$var" = '#' ] ||  [ "$var" = '{' ] || [ "$var" = '}' ]; }
    then
                 dynamic_array+=("$var")   
    fi 
done < A.txt

str_array_value="${dynamic_array[*]}" ; echo "$str_array_value" | tr ' ' '\n' | awk '!seen[$0]++'

Output :

a   
b   
c    
d
e

edited Nov 19 '20 at 05:24

answered Nov 18 '20 at 14:18

codeholic24

307
3
15

2

Please note that while this works, using shell loops to process text files is very inefficient, and in most cases should be [avoided](https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice) in favor of one of the dedicated tools like `awk`, `sed`, `perl` or `grep`. – AdminBee Nov 18 '20 at 14:50
1

Copy/paste that into http://shellchek.net and it'll tell you about some of the issues and read [why-is-using-a-shell-loop-to-process-text-considered-bad-practice](https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice) for why this is the wrong approach anyway. – Ed Morton Nov 19 '20 at 00:11

How to merge multiple piped awk commands into a single awk command

5 Answers5