2

I am using

find . -name '*.[cCHh][cC]' -exec grep -nHr "$1" {} ';'
find . -name '*.[cCHh]' -exec grep -nHr "$1" {} ';'

to search for a string in all files ending with .c, .C, .h, .H, .cc and .CC listed in all subdirectories. But since this includes two commands this feels inefficient.

How do I write a regex to include .c,.C,.h,.H,.cc and .CC files using one single regex?

EDIT: I am running this on bash on a Linux machine.

Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
Arpith
  • 1,091
  • 3
  • 14
  • 23
  • 1
    By the way, you can use `'+'` at the end of `find` instead of `';'`. It will accelerate the command due to shell will execute one `grep` per many files, not one `grep` per file as with `';'`. – rush Oct 19 '12 at 12:45

3 Answers3

3

As you (incorrectly – what you used is a shell pattern) mentioned it in the subject, you should use regular expressions:

find . -iregex '.*\.[ch]+'

The above is lazy approach, which will also find .ch, .hh and alike, if there exists. For exact matches you still have to enumerate what you want, but that is still easier with regular expressions:

find . -regex '.*\.\(c\|C\|cc\|CC\|h\|H\)'
manatwork
  • 30,549
  • 7
  • 101
  • 91
  • How is this different from using `find . -name '.*\.\(c\|C\|cc\|CC\|h\|H\)' `? – Arpith Oct 20 '12 at 18:03
  • @Arpith, with `-name` you specify a shell pattern, with `-regex` you specify a regular expression. That '.*\.\(c\|C\|cc\|CC\|h\|H\)' string interpreted as shell pattern will rarely match anything, but certainly not what you intended in your question: http://pastebin.com/yhddCnbv – manatwork Oct 21 '12 at 09:52
1

Can be shortened to this single line:

find -type f -regextype posix-egrep -iregex '.*\.(cc|h|c)$' -exec grep -nHr "$1" {} \;

daisy
  • 53,527
  • 78
  • 236
  • 383
  • Your regular expression is wrong. It says “any character 0 or more times, followed by one of the enumerated strings”. On my machine that finds a lot of .sh script files… – manatwork Oct 19 '12 at 08:52
  • @manatwork right, updated the answer – daisy Oct 19 '12 at 09:01
  • knitpicking here, but the above would match `.cC` or `.Cc` files which were not requested. Also note that the `$` is not needed as GNU find's regexps are implicitely anchored. – Stéphane Chazelas Oct 19 '12 at 09:14
1

Portably/standardly (POSIX, Unix (SUS) and Linux (LSB) standards) and efficiently, you'd write it:

find . \( -name '*.cc' -o -name '*.CC' -o -name '*.[cChH]' \) \
  -type f -exec grep -n -- "$1" /dev/null {} +

The most important point here is to use + instead of ;. Otherwise, you'll run one grep command per file.

The -H option is GNU specific, but adding /dev/null (which makes sure grep gets at least two files to look in) guarantees that grep displays the file name.

You'll need "--" unless you can make sure that $1 will never start with -.

Adding -type f here, to avoid looking into non-regular files (like directories), but as that means it also excludes symlinks, you may wish to leave it out.

Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501