12

This is my first time bash scripting so I'm probably making an easy mistake.

Basically, I'm trying to write a script that gets the groups of a user, and if they are in a certain group, it will log that accordingly. Evidently there will be more functionality, but there's no point building that when I can't even get the regex working!

So far, I have this:

#!/bin/bash

regex="^([a-zA-Z0-9\-_]+ : [a-zA-Z0-9\-_]+) (usergroup)$"

# example output
groups="username : username usergroup"

echo "$groups" >> /home/jrdn/log

if [[ "$groups" =~ $regex ]]; then
    echo "Match!" >> /home/jrdn/log
else
    echo "No match" >> /home/jrdn/log
fi

Every place I've tried that regex, it works. But in the bash script, it only ever outputs the $groups, followed by No match. So can someone tell me what's wrong with it?

jrdn
  • 223
  • 1
  • 2
  • 8
  • 1
    What makes you think anything is wrong with it? – manatwork Oct 03 '13 at 14:17
  • It echoes "No match". Could be something wrong with the comparison, there's something wrong somewhere. – jrdn Oct 03 '13 at 14:18
  • Works for me. What version of bash do you have? – peterph Oct 03 '13 at 14:19
  • GNU bash, version 4.2.37(1)-release (x86_64-pc-linux-gnu) – jrdn Oct 03 '13 at 14:20
  • Works for me too. `bash` 4.1.10(4). http://pastebin.com/PgyiZujJ Actually I see no reason to not work. How you run it? – manatwork Oct 03 '13 at 14:24
  • Interesting, looks like something in your environment. How about trying a much simpler regex like trying to match `^a` on `"asd"` and `"qwe"` and then expanding it piece by piece? – peterph Oct 03 '13 at 14:26
  • @manatwork: just running it like: `./install.sh` @peterph: running `^([a])` against `abc` and `dbc` returns the proper results – jrdn Oct 03 '13 at 14:30
  • 1
    @jrdnhannah then try to slowly re-create your target regexp, first match `^([a-zA-Z0-9\-_]+)` then add the colon and so on... you should find out pretty soon, where is the problem. – peterph Oct 03 '13 at 14:35
  • @peterph I just tried running it on my mac, on the off chance it works.. And it does. I will simple it down though, and work out what my box doesn't like, and then try and figure out why it doesn't like it. Thanks – jrdn Oct 03 '13 at 14:39
  • @peterph: It didn't like the underscore. Escaped that with a backslash, and it's working. Such a simple solution! Oh well, at least I know for next time. Thanks everybody :) – jrdn Oct 03 '13 at 14:43
  • 2
    Same here with bash 4.2.45. Escaping the underscore fixed it. Weird. @jrdnhannah could you write that up as an answer and accept it please? – terdon Oct 03 '13 at 14:54
  • 1
    Since I've only just signed up to the Unix SE, it requires me to wait 8 hours before answering my own. Happy to mark it as answered if somebody else does, though. – jrdn Oct 03 '13 at 15:07
  • There you go. Interesting thing is that my Bash 4.2.45 was ok with the unescaped underscore. – peterph Oct 03 '13 at 15:17
  • Sounds like a bug in bash and/or [e]glibc. Broken on my Debian 4.2.45(1). Same problem with `egrep`; so this is probably eglibc, not bash. I have 2.17-92+b1. Actually, by the docs, the regex is wrong... – derobert Oct 03 '13 at 15:20
  • @peterph seriously? It worked on your bash 4.2.45(1)? Which distro? – terdon Oct 03 '13 at 15:27
  • 4
    @terdon bash just calls libc's regex functions, probably. So it depends on the libc version, not the bash version. See my answer... (Or maybe even on the collation sequence you have in use) – derobert Oct 03 '13 at 15:30
  • @terdon seems that my `LC_COLLATE=POSIX` (which is the only thing differing from my `[ll_CC].utf8`) "saved" me again. :) – peterph Oct 03 '13 at 15:58

2 Answers2

13

From man 7 regex:

A bracket expression is a list of characters enclosed in "[]". …

… To include a literal '-', make it the first or last character…. [A]ll other special characters, including '\', lose their special significance within a bracket expression.

Trying the regexp with egrep gives an error:

$ echo "username : username usergroup" | egrep "^([a-zA-Z0-9\-_]+ : [a-zA-Z0-9\-_]+) (usergroup)$"
egrep: Invalid range end

Here is a simpler version, that also gives an error:

$ echo 'hi' | egrep '[\-_]'
egrep: Invalid range end

Since \ is not special, that is a range, just like [a-z] would be. You need to put your - at the end, like [_-] or:

echo "username : username usergroup" | egrep "^([a-zA-Z0-9_-]+ : [a-zA-Z0-9_-]+) (usergroup)$"
username : username usergroup

This should work regardless of your libc version (in either egrep or bash).

edit: This actually depends on your locale settings too. The manpage does warn about this:

Ranges are very collating-sequence-dependent, and portable programs should avoid relying on them.

For example:

$ echo '\_' | LC_ALL=en_US.UTF8 egrep '[\-_]'
egrep: Invalid range end
$ echo '\_' | LC_ALL=C egrep '[\-_]'
\_

Of course, even though it didn't error, it isn't doing what you want:

$ echo '\^_' | LC_ALL=C egrep '^[\-_]+$'
\^_

It's a range, which in ASCII, includes \, [, ^, and _.

derobert
  • 107,579
  • 20
  • 231
  • 279
4

General rule with regexps (and any bugs in larger pieces of code): cut it down and rebuild it step by step or use bisecting - whatever works better for you.

In this case the culprit turned out to be the underscore - escaping it with a backslash has made it work.

peterph
  • 30,520
  • 2
  • 69
  • 75