2

I have input.txt

abcd
abcg

To select lines beginning with 'a' and ending with 'g' I write:

cat input.txt | awk '/^a/' | awk '/g$/{print $0}'

How can I combine the regular expressions ^a and g$ to be able to use only one instance of awk?

Viesturs
  • 933
  • 3
  • 10
  • 15
  • Does this answer your question? [Why doesn't ^s$ in regex match a string like "starts with s and ends with s"?](https://unix.stackexchange.com/questions/613519/why-doesnt-s-in-regex-match-a-string-like-starts-with-s-and-ends-with-s) While the problem is not exactly the same, the answer should still apply to yours. – AdminBee Sep 23 '22 at 07:40
  • @AdminBee, no it doesn't answer my question. My question is more general. – Viesturs Sep 23 '22 at 09:58

3 Answers3

4

Just use a single regex that matches both start and finish:

awk '/^a.*g$/' input.txt

Or, if you really want to use two, you can combine them with &&:

awk '/^a/ && /g$/' input.txt
terdon
  • 234,489
  • 66
  • 447
  • 667
  • Is `.*` a combination of `.` and `*` or is it a separate operator? – Viesturs Sep 22 '22 at 15:32
  • 1
    It is a combination of `.` (any character) and the modifier `*` which means "0 or more times", so it means "match absolutely anything, including nothing at all". – terdon Sep 22 '22 at 15:38
  • 1
    Note that *anything* is not the same thing as *any sequence of characters*. For instance, in a UTF-8 locale and on a GNU system, both don't give the same outcome on the output of `printf 'appliqu\351ing\n'` (appliquéing encoded in ISO8859-1) for instance. – Stéphane Chazelas Sep 22 '22 at 15:54
  • Sigh. I will never understand this sort of thing. You're absolutely right, @StéphaneChazelas, `printf 'appliqu\351ing\n' | awk '/^a.*g$/'` returns nothing while `printf 'appliqu\351ing\n' | awk '/^a/ && /g$/'` works on my Arch system, although at least `printf 'appliqu\351ing\n' | perl -ne 'print if /^a.*g$/'` and `printf 'appliqu\351ing\n' | perl -ne 'print if /^a/ && /g$/'` both work. I take it that `.` doesn't match `\351` for some reason? – terdon Sep 22 '22 at 16:00
  • `\351` cannot be decoded into a character in UTF-8, so it's not matched by `.`. With `perl`, you need `-C` or `-Mopen=locale` to decode input as text. – Stéphane Chazelas Sep 22 '22 at 16:05
  • The real question should be: Why are lines with invalid character allowed in your text file ? @StéphaneChazelas – QuartzCristal Sep 22 '22 at 21:58
  • Any of `-C0` or `-C` or `-C127` or `-Mopen=locale` will match *appliquéing encoded in ISO8859-1* using `.*`. So, it will get printed. @StéphaneChazelas – QuartzCristal Sep 22 '22 at 22:11
  • On my system, with perl 5.36.0, `printf 'appliqu\351ing\n' | perl -ne 'print if /^a.*g$/'` is printed, with no extra options. – terdon Sep 23 '22 at 08:49
  • But that's not decoded as text, so it's only valid in single-byte locales and only because your pattern has no non-ASCII characters. For instance `perl -ne 'print if /^..$/'` would print a line containing `ê` in a locale using UTF-8. With `-C`/`-Mopen=locale`, @QuartzCristal with `-w`, you'd get errors on that `\351` bytes, and without it with `-Mopen=locale` would be decoded into something like the 4 character `\xE9` string. In any case POSIXly, the behaviour of awk/grep on non-text is unspecified. I was just pointing out that the two approaches were not strictly equivalent. – Stéphane Chazelas Sep 23 '22 at 12:17
  • Also beware there are some locales where the charset has characters whose multibyte encoding ends in the same encoding as that of `g`, so `printf 'appliqu\351ing\n' | perl -ne 'print if /^a.*g$/'` could match on lines that end in those characters. – Stéphane Chazelas Sep 23 '22 at 12:19
2

No need for awk, just grep:

grep "^a.*g$" input.txt
Andre Beaud
  • 406
  • 1
  • 6
1

To make the answer as generic as possible using , here is an alternate way to perform the desired action, where string is passed as from the command line.

Demonstration test data is embedded in this example.

Using the script

#!/bin/sh

sSTRT="${1}"
sEND="${2}"

echo "John Wells
John Wayne
Robert Wayne" |
awk -v sTrt="^${sSTRT}" -v sEnd="${sEND}\$" ' $0 ~ sTrt && $0 ~ sEnd '

and executing the command

script "John" "Wayne"

the output is

John Wayne

with other lines ignored.

Special note: the "^" abd "$" must be passed literally as part of the awk variables.

Eric Marceau
  • 368
  • 1
  • 10