10

From Gawk's manual:

When awk statements within one rule are short, you might want to put more than one of them on a line. This is accomplished by separating the statements with a semicolon (‘;’). This also applies to the rules themselves. Thus, the program shown at the start of this section could also be written this way:

/12/ { print $0 } ; /21/ { print $0 }

NOTE: The requirement that states that rules on the same line must be sepa rated with a semicolon was not in the original awk language; it was added for consistency with the treatment of statements within an action.

But I have seen from https://stackoverflow.com/q/20262869/156458

awk '$2=="no"{$3="N/A"}1' file

Aren't $2=="no"{$3="N/A"} and 1 two statements? why are they not separated by anything?

Thanks.

Tim
  • 98,580
  • 191
  • 570
  • 977

3 Answers3

12

Very good question! I think the key is this: "Thus, the program shown at the start of this section could also be written this way:"

Is not mandatory to be written in this way. It is a kind of alternative way. This means (and has been proved in action) that below statements are both correct :

$ awk '/12/ { print $0 } /21/ { print $0 }' file
$ awk '/12/ { print $0 } ; /21/ { print $0 }' file

I think this semicolon usage is to cover really short - idiomatic code , for example cases that we omit the action part and we want to apply multiple rules on the same line:

$ awk '/12//21/' file
awk: cmd. line:2: /12//21/
awk: cmd. line:2:         ^ unexpected newline or end of string

In this case using a semicolon is mandatory to separate rules (=conditions):

$ awk '/12/;/21/' file

Since the {action} part is ommited in both rules/both conditions, the default action will be performed for every rule = {print $0}

Jeff Schaller
  • 66,199
  • 35
  • 114
  • 250
George Vasiliou
  • 7,803
  • 3
  • 18
  • 42
  • I believe that (for action-pattern lists): The semicolon **after** an action closing brace **is** optional. –  Jul 18 '17 at 02:29
  • 1
    good answer but i'm not sure that `'/12/;/21/'` is all that common as awk idiom. IMO & IME it's more common to write something like that as `'/12|21/'` - more efficient too, only one regexp match rather than two or more. – cas Jul 18 '17 at 02:36
  • @cas The use of `/12/;/21/` is a simpliied example to demonstrate why semicolon is necessary to separate rules / conditions. For simple tasks , you are right, we can use just one regex with `or`. Someone can extend this "conditions only" syntax to more complicated conditions , i.e `awk ' $1==10;$2+$NF<100' file`. But since in all conditions the action is the same = `{print $0}` , theoritically we can use OR everywhere : `awk '$1==10 || $2+$NF<10' file` – George Vasiliou Jul 18 '17 at 07:54
  • yes, i know. that was kind of my point. it doesn't make sense to have multiple statements with different patterns and the SAME action. it makes a lot more sense to have a single pattern with a more complex regex and/or multiple OR-ed conditionals: `a || b || c || d || e {action}` rather than `a {action}; b {same action}; c {same action again} ...`. easier to change a single common action for all those patterns too. in other words, while this is something you **can** do, it's also something you almost never want to. I thought that was worth adding to your answer. – cas Jul 18 '17 at 14:32
6

In gawk, this two quote from the manual describe the issue:

An action consists of one or more awk statements, enclosed in braces (‘{…}’). Each statement specifies one thing to do. The statements are separated by newlines or semicolons.

A semicolon is a "separator" but not a "terminator".
The only valid terminator of an action is a closing brace (}).

Therefore, what follows an action closing brace (}) must be some other pattern{action}

In the "man mawk" there is some other description that may help clarify what awk should do:

Statements are terminated by newlines, semi-colons or both. Groups of statements such as actions or loop bodies are blocked via { ... } as in C. The last statement in a block doesn't need a terminator.

The "man nawk" explains it like this:

The pattern comes first, and then the action. Action statements are enclosed in { and }.

And, if you want to dwell into the detail, read the POSIX description:

action           : '{' newline_opt                             '}'
                 | '{' newline_opt terminated_statement_list   '}'
                 | '{' newline_opt unterminated_statement_list '}'
                 ;

And search for what is an "unterminated" statement list.

Or, simpler, search for Action to read:

Any single statement can be replaced by a statement list enclosed in curly braces. The application shall ensure that statements in a statement list are separated by <newline> or <semicolon> characters.

Again: are separated by <newline> or <semicolon> characters

3

The semicolon between conditional blocks appears to be optional; only the semicolons between statements within blocks appear to be mandatory:

$ echo -e "foo\nbar" | gawk '/foo/ { print "foo found" } /bar/ {print "bar found"}'
foo found
bar found
$ echo -e "foo\nbar" | gawk '/foo/ { print "foo found" }; /bar/ {print "bar found"}'
foo found
bar found
$ echo -e "foo\nbar" | gawk '/foo/ { print "foo found"; print "whee" }'
foo found
whee
$ echo -e "foo\nbar" | gawk '/foo/ { print "foo found" print "whee" }'
gawk: cmd. line:1: /foo/ { print "foo found" print "whee" }
gawk: cmd. line:1:                           ^ syntax error

However, when the actual code block between two conditionals is omitted in favor of the default (i. e. {print}), the semicolon becomes necessary:

$ echo -e "foo\nbar" | gawk '/foo/ /bar/'
gawk: cmd. line:2: /foo/ /bar/
gawk: cmd. line:2:            ^ unexpected newline or end of string
$ echo -e "foo\nbar" | gawk '/foo/; /bar/'
foo
bar
DopeGhoti
  • 73,792
  • 8
  • 97
  • 133