regex for matching an entire line $0 in awk

Question

I need some help or advice wrt awk and its use of regular expressions. I have a data input file with an irregular structure. To parse this file correctly I need to recognize a line of the following form:

@ 8/1/17, 10:04 PM

A line with this pattern marks the end of a complete transaction. It's simply a date & time stamp preceded by a space and the @ character.

I've cobbled a regular expression that seems to match in "most" usage:

\W\@\W[0-9]{1,2}\/[0-9]{1,2}\/[0-9]{1,2}\,\W[0-9]{1,2}\:[0-9]{2}\W[AP]M

However, it does not seem to match when used in the following awk statement:

$ awk 'match($0, /\W\@\W[0-9]{1,2}\/[0-9]{1,2}\/[0-9]{1,2}\,\W[0-9]{1,2}\:[0-9]{2}\W[AP]M/) {print $0}' testfile2.txt

My system (macos mojave) has an old version of awk awk version 20070501.

I've also found:

grep -e fails to match this pattern to any line in testfile2.txt, but egrep and grep -E do match the lines I expected them to match.
awk 'match($0, /\@/) {print $0}' testfile2.txt does match (& print) the expected lines, but I can't rely on a single character!

Here's testfile2.txt:

+13054261988: Forwarding data to primary repository
@ 1/7/18, 4:21 PM
+16744774911: Use this URL: https://www.repo-prime.ga/
@ 1/7/18, 4:22 PM
+13054261988: Will do. Passwords OK?
@ 1/7/18, 6:12 PM
+16744774911: No, use 2FA for all transactions
@ 1/7/18, 8:56 PM
+13054261988: Using Google's authenticator?

If so, will need additional information.
@ 1/7/18, 9:36 PM
+13054261988: RSVP ASAP, I have transactions that need to be uploaded.
@ 1/7/18, 9:46 PM

Is my regular expression failing to match in awk usage due to an error I can't see in my awk statement, or is it due to the regex itself, a combination of both, etc?

Related: [Why does my regular expression work in X but not in Y?](https://unix.stackexchange.com/questions/119905/why-does-my-regular-expression-work-in-x-but-not-in-y) — steeldriver, Oct 31 '19 at 18:11
What does "seems to match in "most" usage" mean? How did you test it? — ilkkachu, Oct 31 '19 at 18:47
@ilkkachu: It works in `egrep` and `grep -E`, but not `grep -e` on macos. Also works in 'BBEdit v 13'. — Seamus, Oct 31 '19 at 18:54
@steeldriver: That seems to be it. `macos` has a different set of expressions than my Linux distro (where something similar worked). Actually, it seems that `macos` may be inconsistent between (for example) `awk` and `egrep`/`grep -E`. Making progress now! — Seamus, Oct 31 '19 at 19:03
`awk '$1=="@" && $4 ~/^[AP]M$/'` or `awk '/@.*\<[AP]M\>/'` could probably be enough... — JJoao, Oct 31 '19 at 19:50

score 1 · Answer 1 · answered Oct 31 '19 at 18:10

1

why strictly matching /\W (non-word character) before @ ? as in your text file @ is at start of the line
no need to escape the chars as \@, \,, : (they are not special chars)
calling match() is redundant if only need to match a pattern

$ awk '/^@ [0-9]{1,2}\/[0-9]{1,2}\/[0-9]{1,2}, [0-9]{1,2}:[0-9]{2} [AP]M/' file
@ 1/7/18, 4:21 PM
@ 1/7/18, 4:22 PM
@ 1/7/18, 6:12 PM
@ 1/7/18, 8:56 PM
@ 1/7/18, 9:36 PM
@ 1/7/18, 9:46 PM

answered Oct 31 '19 at 18:10

RomanPerekhrest

29,703
3
43
67

I need this to work with the `match` stmt as it's part of a larger script. – Seamus Oct 31 '19 at 18:41
Doesn't work on macos: $ awk 'match($0, //^@ [0-9]{1,2}\/[0-9]{1,2}\/[0-9]{1,2}, [0-9]{1,2}:[0-9]{2} [AP]M/'/) {print $0}' testfile2.txt -bash: syntax error near unexpected token `)' – Seamus Nov 01 '19 at 01:21
Also doesn't work on macos: $ awk '/^@ [0-9]{1,2}\/[0-9]{1,2}\/[0-9]{1,2}, [0-9]{1,2}:[0-9]{2} [AP]M/' testfile2.txt gives no output at all – Seamus Nov 01 '19 at 01:21

score 1 · Accepted Answer · answered Oct 31 '19 at 19:40

Seems that very old versions of awk had not {…} capability.

This older regex syntax should match in any awk:

awk '/@ [0-9][0-9]?\/[0-9][0-9]?\/[0-9][0-9]?, [1-2]?[0-9]:[0-6][0-9] [AP]M/' file

If your awk could match bracket expressions like [[:blank:]], the regex could be made to be a little more flexible:

awk '/@[[:blank:]][0-9][0-9]?\/[0-9][0-9]?\/[0-9][0-9]?,[[:blank:]][1-2]?[0-9]:[0-6][0-9][[:blank:]][AP]M/' file

If matching one (or more) digits is good enough (I can't see why not), you can use the shorter regex:

awk '/@ [0-9]+\/[0-9]+\/[0-9]+, [1-2]?[0-9]:[0-6][0-9] [AP]M/' file

And you can add start ^ and end $ to make the regex quite more restrictive, if needed.

I am not using match for such a simple matching of a line, but the same regex would work perfectly fine with that function.

This works on mac os. Also [see this answer](https://stackoverflow.com/a/28256343/5395338), and this [regex tutorial](https://www.regular-expressions.info/posixbrackets.html) for a more detailed explanation of @Isaac use of the `[[:blank:]]` character class, and the related `[[:space:]]` class. Note POSIX compliance :) — Seamus, Nov 01 '19 at 01:18

regex for matching an entire line $0 in awk

2 Answers2