3

I am trying to check if a variable matches a regex in POSIX shell. However, this seems more difficult than I thought. I am trying to check if the variable is a valid size string (e.g. 10M or 6G etc)

if echo $var | grep -Eq '^\d+[MG]$';
then
    echo "match"
else
    echo "no match"
fi

That's what I tried, but for some reason I never get a match, even if the variable contains the correct string? Any idea why?

user4042470
  • 185
  • 1
  • 4
  • 1
    Always paste your script into `https://shellcheck.net`, a syntax checker, or install `shellcheck` locally. Make using `shellcheck` part of your development process. – waltinator May 10 '21 at 00:23
  • Alright. It told me to add doublequotes, however, the problem remains. Will use shellcheck from now on tho – user4042470 May 10 '21 at 00:26
  • Related: [grep not working as expected](https://unix.stackexchange.com/questions/498925/grep-not-working-as-expected) and [Why does my regular expression work in X but not in Y?](https://unix.stackexchange.com/questions/119905/why-does-my-regular-expression-work-in-x-but-not-in-y) – steeldriver May 10 '21 at 00:36

1 Answers1

6

GNU grep supports three types of regular expressions: Basic, Extended (ERE) and Perl (PCRE). In GNU grep, EREs don't provide functionality over basic ones, but some characters have special meanings, such as the plus sign. Source: man page.

\d does not mean anything special in an ERE; it's just the character d. To express digits, use [[:digit:]] or the old [0-9] (I wonder if there are character encodings where [0-9] is not the same as [[:digit:]]).

Your expression works as a PCRE, though:

if echo $var | grep -Pq '^\d+[MG]$';
then
    echo "match"
else
    echo "no match"
fi

Note the -P option instead of -E.

It would seem that POSIX grep does not support PCREs. I have not read the POSIX ERE definition, though.

berndbausch
  • 3,477
  • 2
  • 15
  • 21
  • [Does every country use Arabic numerals?](https://www.reddit.com/r/NoStupidQuestions/comments/5im7ts/does_every_country_use_arabic_numerals/). I wonder if there are character encodings where `[0-9]` is not the same as `[0123456789]` – glenn jackman May 10 '21 at 12:11
  • Ironically, I guess Arabic countries don't, or they do plus they use their own fonts. – berndbausch May 10 '21 at 12:18