awk: select lines begining with string and ending with string

Question

I have input.txt

abcd
abcg

To select lines beginning with 'a' and ending with 'g' I write:

cat input.txt | awk '/^a/' | awk '/g$/{print $0}'

How can I combine the regular expressions ^a and g$ to be able to use only one instance of awk?

Does this answer your question? [Why doesn't ^s$ in regex match a string like "starts with s and ends with s"?](https://unix.stackexchange.com/questions/613519/why-doesnt-s-in-regex-match-a-string-like-starts-with-s-and-ends-with-s) While the problem is not exactly the same, the answer should still apply to yours. — AdminBee, Sep 23 '22 at 07:40
@AdminBee, no it doesn't answer my question. My question is more general. — Viesturs, Sep 23 '22 at 09:58

score 4 · Accepted Answer · answered Sep 22 '22 at 15:26

4

Just use a single regex that matches both start and finish:

awk '/^a.*g$/' input.txt

Or, if you really want to use two, you can combine them with &&:

awk '/^a/ && /g$/' input.txt

answered Sep 22 '22 at 15:26

terdon

234,489
66
447
667

Is `.*` a combination of `.` and `*` or is it a separate operator? – Viesturs Sep 22 '22 at 15:32
1

It is a combination of `.` (any character) and the modifier `*` which means "0 or more times", so it means "match absolutely anything, including nothing at all". – terdon Sep 22 '22 at 15:38
1

Note that *anything* is not the same thing as *any sequence of characters*. For instance, in a UTF-8 locale and on a GNU system, both don't give the same outcome on the output of `printf 'appliqu\351ing\n'` (appliquéing encoded in ISO8859-1) for instance. – Stéphane Chazelas Sep 22 '22 at 15:54
Sigh. I will never understand this sort of thing. You're absolutely right, @StéphaneChazelas, `printf 'appliqu\351ing\n' | awk '/^a.*g$/'` returns nothing while `printf 'appliqu\351ing\n' | awk '/^a/ && /g$/'` works on my Arch system, although at least `printf 'appliqu\351ing\n' | perl -ne 'print if /^a.*g$/'` and `printf 'appliqu\351ing\n' | perl -ne 'print if /^a/ && /g$/'` both work. I take it that `.` doesn't match `\351` for some reason? – terdon Sep 22 '22 at 16:00
`\351` cannot be decoded into a character in UTF-8, so it's not matched by `.`. With `perl`, you need `-C` or `-Mopen=locale` to decode input as text. – Stéphane Chazelas Sep 22 '22 at 16:05
The real question should be: Why are lines with invalid character allowed in your text file ? @StéphaneChazelas – QuartzCristal Sep 22 '22 at 21:58
Any of `-C0` or `-C` or `-C127` or `-Mopen=locale` will match *appliquéing encoded in ISO8859-1* using `.*`. So, it will get printed. @StéphaneChazelas – QuartzCristal Sep 22 '22 at 22:11
On my system, with perl 5.36.0, `printf 'appliqu\351ing\n' | perl -ne 'print if /^a.*g$/'` is printed, with no extra options. – terdon Sep 23 '22 at 08:49
But that's not decoded as text, so it's only valid in single-byte locales and only because your pattern has no non-ASCII characters. For instance `perl -ne 'print if /^..$/'` would print a line containing `ê` in a locale using UTF-8. With `-C`/`-Mopen=locale`, @QuartzCristal with `-w`, you'd get errors on that `\351` bytes, and without it with `-Mopen=locale` would be decoded into something like the 4 character `\xE9` string. In any case POSIXly, the behaviour of awk/grep on non-text is unspecified. I was just pointing out that the two approaches were not strictly equivalent. – Stéphane Chazelas Sep 23 '22 at 12:17
Also beware there are some locales where the charset has characters whose multibyte encoding ends in the same encoding as that of `g`, so `printf 'appliqu\351ing\n' | perl -ne 'print if /^a.*g$/'` could match on lines that end in those characters. – Stéphane Chazelas Sep 23 '22 at 12:19

score 2 · Answer 2 · answered Sep 22 '22 at 15:26

2

No need for awk, just grep:

grep "^a.*g$" input.txt

answered Sep 22 '22 at 15:26

Andre Beaud

406
1
6

Or `grep -x 'a.*g'`. – Stéphane Chazelas Sep 22 '22 at 15:55
@StéphaneChazelas A grep command won't match *appliquéing encoded in ISO8859-1*. – QuartzCristal Sep 22 '22 at 21:55

score 1 · Answer 3 · answered Sep 22 '22 at 20:34

To make the answer as generic as possible using awk, here is an alternate way to perform the desired action, where pattern string is passed as variable from the command line.

Demonstration test data is embedded in this example.

Using the script

#!/bin/sh

sSTRT="${1}"
sEND="${2}"

echo "John Wells
John Wayne
Robert Wayne" |
awk -v sTrt="^${sSTRT}" -v sEnd="${sEND}\$" ' $0 ~ sTrt && $0 ~ sEnd '

and executing the command

script "John" "Wayne"

the output is

John Wayne

with other lines ignored.

Special note: the "^" abd "$" must be passed literally as part of the awk variables.

awk: select lines begining with string and ending with string

3 Answers3