7

I am on a closed network (i.e. no connectivity to the internet).

I have a bourne shell script that asks for the user to enter a regular expression for use with grep -P.

Generally speaking, I like to do some form of input validation.
Is there a way to test a string variable to see if it is a (valid) regex?
(Copying things from the internet onto my system can be done, but it takes forever and is a PITA -- thus I am looking for way to do it natively.)

Scottie H
  • 644
  • 3
  • 11
  • 3
    What do you mean by "valid regex"? What's an invalid regex? – muru Mar 22 '23 at 23:34
  • 4
    And you mean actual bourne shell right, not bash? – terdon Mar 23 '23 at 12:38
  • 1
    bash is `Bourne-again shell`, and is far more featureful than the Bourne shell – phuclv Mar 23 '23 at 13:05
  • natively... `grep` already isn't a native a part of your shell, but an external tool. Granted, it's a standard one, and so likely to be found anywhere, but then again e.g. Debian-based systems always have Perl too, so what tools you have really depends on the system. If you're on Linux, you likely have either Bash or some smaller POSIX-only shell, like Dash or Busybox, _not_ an ancient Bourne shell. (Note the [tag:bourne-shell] tag has the description "a historical implementation of /bin/sh") – ilkkachu Mar 23 '23 at 16:32
  • @terdon: Yes, /bin/sh, not /bin/bash. – Scottie H Mar 28 '23 at 18:21
  • @phucly: Yes, I know. Appreciate the reminder. – Scottie H Mar 28 '23 at 18:26
  • @ilkkachu Right, not "part of" the shell, however grep is expected to be available on all systems. What I meant by "native" is that I don't have to go to the internet and download something, or install some odd package that would not be expected to be installed by default. – Scottie H Mar 28 '23 at 18:26
  • Yeah, unless you have a really, really, really old operating system (think more than 20 or 30 years old), your `/bin/sh` isn't the Bourne shell. You are not very likely to find the Bourne shell in the wild these days. – terdon Mar 28 '23 at 21:57

1 Answers1

19

No, but with some tools it's not hard to test whether a regex compiles or not.

For example, with grep: echo | grep -P '[' - the exit code, $?, will be 2, indicating an error occurred (and for this example, grep will print "grep: missing terminating ] for character class" to stderr - you can redirect stderr to /dev/null if you only want the exit code).

An exit code of 1 indicates that the regex compiled OK but didn't match the input.

These exit codes are specific to GNU grep. Other tools, if they even have such a capability, will probably have different exit codes, and different ways of indicating specific kinds of errors.

Note that this is not even remotely close to telling you whether a regex will correctly match what you want it to (and not match what you don't want it to).

In short, try it and test the exit code. And know your tools.

cas
  • 1
  • 7
  • 119
  • 185
  • An exit code of `2` indicates a failure as you say, but not necessarily of the regex. For example, `grep foo non_existent_file` will also exit with `2` as will trying to read a file that you don't have read permission on. I can't off the top of my head think of other cases, so maybe you can grab the stderr and parse that to know if it was a regex problem? – terdon Mar 23 '23 at 12:42
  • 5
    @terdon yep, i considered adding a note with that example (actually, `grep . nosuchfile`) but, in the context of piping an empty line into grep, that's never going to happen (ditto for perms errors), so I cut it. As you say, an exit code of `2` in GNU grep doesn't mean "regex error", it just means "error". However, in this context, there isn't anything else it can be (except perhaps OOM or maybe a pipe error due to lack of resources or something else arising from a far bigger system problem - e.g. CPU or RAM fault - that's way beyond the scope of a simple method to check the validity of an RE) – cas Mar 23 '23 at 13:10
  • Fair enough, yes. That makes sense, thanks. – terdon Mar 23 '23 at 13:38
  • 4
    I don't think there's any need for the pipeline here - just take input from the null device: `grep "$re" /dev/null 2>/dev/null`. If `/dev/null` doesn't exist, you have Bigger Problems! (Minor aside: my versions of GNU grep - 3.6 and 3.8 - have a different error message. It's just "_grep: Invalid regular expression_" which isn't quite as informative as yours). – Toby Speight Mar 23 '23 at 13:55
  • the pipeline makes it easier to edit the regex from the shell's history, because it's at the end of the line, not in the middle. up-arrow, backspaces or ^W. BTW, my grep is `grep (GNU grep) 3.8` (on debian sid, if that makes any difference) – cas Mar 23 '23 at 15:20
  • I think the exit status of `grep` is a de facto standard: `0` for a match, `1` for no match, something else for errors. So this shouldn't be contingent on GNU grep. – Barmar Mar 23 '23 at 15:31
  • The `-P` isn't necessary - in fact you should use the flag that corresponds to the flavor of regex you intend to test for - `-P` is perl, '-E` is extended, for example. – Dennis Williamson Mar 23 '23 at 15:44
  • 1
    @Barmar, yep, [that's right](https://pubs.opengroup.org/onlinepubs/9699919799/utilities/grep.html). I tested, it works with the grep on macOS and Busybox grep too. Though some implementation could return e.g. 3 instead of 2... – ilkkachu Mar 23 '23 at 16:29
  • 2
    @TobySpeight, since I happened to test, redirecting from `/dev/null` doesn't seem to work with Busybox, e.g. `busybox grep -e '[' < /dev/null` just exits with status 1 without a comment. It looks like it doesn't bother to compile the regex if there's no input. I wouldn't have expected that choice, but it's not exactly wrong, as the empty input has nothing to match anyway. – ilkkachu Mar 23 '23 at 16:35
  • @Barmar, it's actually [a _de jure_ standard](https://pubs.opengroup.org/onlinepubs/9699919799/utilities/grep.html#tag_20_55_14) on POSIX systems. – Toby Speight Mar 23 '23 at 16:39
  • 1
    Thanks @ilkkachu - that's a good catch! – Toby Speight Mar 23 '23 at 16:41
  • @cas, you could put redirections at the beginning (`/dev/null grep "$re"`) if that's a concern. But that's a moot point if you want to be portable with BusyBox grep as per ilkkachu's comment. – Toby Speight Mar 23 '23 at 16:42
  • 2
    @ilkkachu Right. The test should be `$? -gt 1` rather than `$? -eq 2` – Barmar Mar 23 '23 at 17:03
  • 3
    I remember using an ancient grep. There weren't any invalid regexes. The unclosed class for example would be force-reinterpreted as a literal [. – Joshua Mar 23 '23 at 18:01
  • 1
    @DennisWilliamson Yep, I know grep's options. I used `-P` because that's what Scottie H used in their question. – cas Mar 24 '23 at 02:28
  • 2
    @TobySpeight I figured out the error message difference. With `-P`, GNU grep outputs the more detailed error message. For BRE (default or `-G`) or ERE (`-E`), it just outputs "grep: Invalid regular expression" – cas Mar 24 '23 at 02:33
  • 1
    @cas That makes sense, because the `-P` option defers the actual regex work to the PCRE library, so the error is coming straight from there. – IMSoP Mar 24 '23 at 09:45