1

I am writing a script to filter a file that has contents like

a:10
b:20
c:60
# comment
{{# random mustache templating}}
d=4
e=6

to get the output which would look like

a
b
c
d
e

Here is my command

cat filename.txt | awk '{$1=$1;print}' | awk -F'{{' '{print $1}' | awk -F'=' '{print $1}' | awk -F':' '{print $1}' | awk -F'#' '{print $1}' | awk /./

Purpose:

  • Remove anything in a line from the occurrence of characters '=' or ':'.
  • Remove the line that starts with '{{' to remove templating.
  • Trim whitespaces at the beginning and end of each line.
  • Remove all blank lines.

As I am new to bash, how can I make this command shorter?

αғsнιη
  • 40,939
  • 15
  • 71
  • 114
borz
  • 125
  • 2
  • 7

5 Answers5

2

The field separator can be a full regex, so

awk -F'[:#=]' '!/^{{/ && length($1) > 0 { split($1, a, " "); print a[1] }' filename.txt

is sufficient: any one of ‘:’, ‘#’, ‘=’ will act as a separator. We exclude lines starting with “{{”, match lines where $1 is non-empty, split $1 on whitespace, and print the first resulting field.

Stephen Kitt
  • 411,918
  • 54
  • 1,065
  • 1,164
  • 4
    For what it's worth, I agree with @AdminBee that inline code markup is cleaner and more legible than pretty-quotes. – terdon Nov 18 '20 at 11:13
  • I suppose it can be blinding to see `print $1` four times in one command, but the OP never *says* that they want only the first “word” from the input line. Arguably, an input of `⁠   foo    bar   ` should produce an output of `foo bar`, not just ``foo``.  P.S. I see now that that the first version of your answer got this “right”; it’s not clear to me why you changed it. – G-Man Says 'Reinstate Monica' Nov 19 '22 at 20:21
  • @G-Man I don’t remember now, it’s possible there were clarifying comments that have been deleted since. The original commands in the question only kept the first word, and since the question asks for a simplification of those commands, it seems reasonable to keep that behaviour even though it’s not identified explicitly. – Stephen Kitt Nov 20 '22 at 16:21
  • “The original commands in the question only kept the first word…”  Um, where do you see that?  Therein lies the point of my comment; the pipeline in the question says `print $1` four times, and *each one is with a field separator other than whitespace.*  So it keeps the “first word” in the sense that `foo bar` is the first word of `⁠   foo    bar:42`. … … … … … … … … … … … … … … … No worries; I have plenty of answers where I cannot reconstruct what I was thinking when I wrote them. – G-Man Says 'Reinstate Monica' Nov 24 '22 at 03:01
  • @G-Man ah yes, I am indeed blind… The answer timeline shows it being accepted after the relevant edit, so presumably there was a reason for it. – Stephen Kitt Nov 24 '22 at 05:32
1

Keep it simple:

$ awk 'NF && ($1 !~ /^(#|\{+)/) { sub(/[:=].*/,""); print $1 }' file
a
b
c
d
e
Ed Morton
  • 28,789
  • 5
  • 20
  • 47
  • (1) I suppose it can be blinding to see `print $1` four times in one command, but the OP never *says* that they want only the first “word” from the input line.  Arguably, an input of `⁠   foo    bar   ` should produce an output of `foo bar`, not just `foo`.  (2) The question is a bit non-specific, but it does explicitly say “Remove all blank lines”, and your code outputs a blank line for an input line that begins with `:` or  `=`. … (Cont’d) – G-Man Says 'Reinstate Monica' Nov 19 '22 at 20:21
  • (Cont’d) …  (3) The OP doesn’t say what they want done with `#`, but their working code removes the first `#` and everything beyond it, so `foo#bar` becomes `foo`.  Your code passes `foo#bar` through unchanged. – G-Man Says 'Reinstate Monica' Nov 19 '22 at 20:21
0

To achieve the result above, I just used regex for the field separator, regex to select the lines and {print $1} to print the first column.

I see no leading whitespace or blank lines in your example, but if you need to deal with these, see my variations to this command below.

awk -F'[:=]' '!/^[#{]/{print $1}' filename.txt

Result:

a
b
c
d
e

If you have whitespace leading or trailing, the following may work. Though, I will admit, without seeing an example it is tricky for me to visualise.

awk -F'[:=]' '{gsub(/^\s+|\s+$/,"",$1)} !/^[#{]/{print $1}' filename.txt

To cover every possible case, based on your comments, I have adapted the example. Now, we have leading and trailing whitespace and empty lines.

a:10
b :20
  c:60
# comment

 {{# random mustache templating}}
d=4
e =6   

This is the slightly altered command to deal with this:

awk -F'[:=]' '{gsub(/^\s+|\s+$/,"",$1)} !/^[#{]/ && !/^$/{print $1}' filename.txt
  1. The field separator regex separates the first field $1 from everything which comes after : or =
  2. gsub removes all leading and trailing spaces
  3. The regex before {print $1} removes all lines starting with a # or { to exclude comments, 'templating' and blank lines.

This produces the following result from the adapted example:

a
b
c
d
e
Bumbling Badger
  • 345
  • 3
  • 7
  • @RishabhBohra The current accepted answer doesn't deal with leading space before the ```{{# random mustache templating}}``` in my adapted example. Instead, it leaves behind ```{{```. However, it really depends on what you are looking for? – Bumbling Badger Nov 19 '20 at 03:57
  • (1) As you know, the question says “Remove all blank lines”, but your code outputs a blank line for an input line that begins with `:` or  `=`. (2) The OP doesn’t say what they want done with ``#``, but their working code removes the first `#` and everything beyond it, so `foo#bar` becomes `foo`.  Your code passes `foo#bar` (and even `foo #bar`) through unchanged. – G-Man Says 'Reinstate Monica' Nov 19 '22 at 20:23
0

Using sed:

sed -E '{ s/\s*([^:=]*).*/\1/ }; /^(\{\{|#|$)/d' infile

Swipe the order of the commands above to sed -E '/.../d; { ... }', if you also want to keep those lines that not started immediately with {{ or # characters but whitespaces.

αғsнιη
  • 40,939
  • 15
  • 71
  • 114
-1

May be this will help you to achieve the expected result

#!/bin/bash

dynamic_array=()

while read -r line 
do 
    var=$(echo "$line" | cut -c 1)    
    if ! { [ "$var" = '#' ] ||  [ "$var" = '{' ] || [ "$var" = '}' ]; }
    then
                 dynamic_array+=("$var")   
    fi 
done < A.txt

str_array_value="${dynamic_array[*]}" ; echo "$str_array_value" | tr ' ' '\n' | awk '!seen[$0]++'

Output :

a   
b   
c    
d
e
codeholic24
  • 307
  • 3
  • 15
  • 2
    Please note that while this works, using shell loops to process text files is very inefficient, and in most cases should be [avoided](https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice) in favor of one of the dedicated tools like `awk`, `sed`, `perl` or `grep`. – AdminBee Nov 18 '20 at 14:50
  • 1
    Copy/paste that into http://shellchek.net and it'll tell you about some of the issues and read [why-is-using-a-shell-loop-to-process-text-considered-bad-practice](https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice) for why this is the wrong approach anyway. – Ed Morton Nov 19 '20 at 00:11