4
[root@localhost opt]# cat cfg
key = value
[root@localhost opt]# grep 'key\s*=\s*.+' cfg
[root@localhost opt]# 

My intent is: the = sign may be followed by zero or more spaces, but must be followed one or more non-space characters.

Why doesn't it output the line key = value?

xmllmx
  • 1,750
  • 2
  • 19
  • 29

2 Answers2

11

Observe:

$ grep 'key\s*=\s*.+' cfg
$ grep 'key\s*=\s*.\+' cfg
key = value
$ grep -E 'key\s*=\s*.+' cfg
key = value

In Basic Regular Expressions (BRE, the default), + means a plus sign. As a GNU extension, one can signal one-or-more-of-the-previous-character using \+. This is also true of ?, {, |, and (. Unless escaped with a backslash, these are all treated a ordinary characters under BRE.

The rules change if you use Extended Regular Expressions, -E. For ERE, the backslash isn't needed and a plain + means one-or-more-of-the-previous-character. Under ERE, \+ means a plain normal plus sign.

John1024
  • 73,527
  • 11
  • 167
  • 163
  • Technically, `\+` is a feature of *enhanced* BREs (like we need even more flavors of REs in POSIX). `grep` appears to pass the `REG_ENHANCED` flag to `regcomp()`; otherwise, you would have to use `{1,}` like `expr` does. – chepner Aug 12 '16 at 11:59
  • But `grep -E 'key\s*=\s*.+'` does match `key = ` (with one or more trailing space) as that's a `=` followed by 0 space (which matches `\s*`) followed by one space character (which matches `.`). Also note that `\s`/`\S` are not standard/portable. You'd want `grep -Ex 'key\s*=\s*\S.*'` or `grep -E '^key\s*=\s*\S'` to force at least one non-space after the `=`. `[[:space:]]` is the standard equivalent of `\s`, though here, `[[:blank:]]` may make more sense. – Stéphane Chazelas Aug 12 '16 at 12:27
  • Note that `\+` in BRE is a GNU extension. In standard BREs, `+` is written `\{1,\}`. – Stéphane Chazelas Aug 12 '16 at 12:58
  • @chepner Yes. Answer updated to note that `\+` under BRE is a GNU extension. (When originally posted, this question was tagged Linux, I see that that tag has since been removed.) – John1024 Aug 15 '16 at 20:08
  • @StéphaneChazelas I see that you have posted (+1) a thorough exploration of this regex. – John1024 Aug 15 '16 at 20:09
1
key\s*=\s*.+

is GNU ERE syntax (assuming you want \s to match any spacing character, and + to match one or more of the preceding atom), so you'd need the GNU implementation of grep and pass the -E option.

However, even then that wouldn't make much sense

First

grep 'key\s*=\s*.+'

is functionaly equivalent to

grep 'key\s*=\s*.'

Because if a string matches anything.+, then it also matches anything. and vis-versa.

Also a spacing character is also a character. Since \s* matches 0 or more spacing characters, key\s*=\s*. is functionaly equivalent to key\s*=. (lines that contain key<optional-spaces>=<one-character-space-or-not>).

Here you want:

grep 'key\s*=\s*\S'

to ask for at least one non-spacing character to be found on the right of the =, which is functionaly equivalent to:

grep 'key\s*=.*\S'

Note that it matches key = foo but also nonkey = foo. If you want the key to be only found at the beginning of the line, you need to ask for it with the ^ anchor:

grep '^key\s*=.*\S'

Or use -x for the regexp to match the whole line:

grep -x 'key\s*=.*\S.*'

Note that the standard equivalent of \s is [[:space:]] ([^[:space:]] for \S).

Another way to address the requirement would be to use extended operators found in some regexps like the PCRE ones to prevent back-tracking.

key=\s*. matches key=  because the regexp engine has \s* go greedily through the space characters after the =, finds 1 and then realises it can't match the . as it reached the end of the line, and then back-tracks to try with fewer matches of \s (0 in that case) so the next . can match (here a space character).

With PCRE, like when using the -P option with GNU grep, you can write:

 grep -P '^key\s*=(?>\s*).'

That (?>...) syntax prevents back-tracking. So the \s* will eat as many spacing characters as possible without being able to backtrack, so will only match if there's at least one non-spacing character after the spaces.

$ printf 'key=%s\n' '' ' ' ' a' | grep '^key\s*=\s*.'
key=
key= a
$ printf 'key=%s\n' '' ' ' ' a' | grep -P '^key\s*=(?>\s*).'
key= a
$ printf 'key=%s\n' '' ' ' ' a' | grep '^key\s*=.*\S'
key= a
Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501