20

While we use * to denote zero or more previous characters in grep, we use *.c to find all C files when we use it with the ls command like ls *.c. Could someone tell how the use of * differs in these two cases?

user
  • 28,161
  • 13
  • 75
  • 138
user3539
  • 4,288
  • 9
  • 34
  • 44

2 Answers2

33

Shell file name globbing and regular expressions use some of the same characters, and they have similar purposes, but you're right, they aren't compatible. File name globbing is a much less powerful system.

In file name globbing:

  • * means "zero or more characters"

  • ? means "any single character"

But in regexes, you have to use .* to mean "zero or more characters", and . means "any single character." A ? means something quite different in regexes: zero or one instance of the preceding RE element.

Square brackets ([]) appear to work the same in both systems on the system I'm typing this on, for simple cases at least. This includes things like POSIX character classes (e.g. [:alpha:]). That said, if you need your commands to work on many different system types, I recommend against using anything beyond elementary things like lists of characters (e.g. [abeq]) and maybe character ranges (e.g. [a-c]).

These differences mean the two systems are only directly interchangeable for simple cases. If you need regex matching of file names, you need to do it another way. find -regex is one option. (Notice that there is also find -name, by the way, which uses glob syntax.)

Warren Young
  • 71,107
  • 16
  • 178
  • 168
  • 2
    I don't know it was called globbing :) – user3539 Dec 08 '12 at 13:07
  • 3
    In addition, there are various flavours of regex. Not all regexes are created the same! And you have many other pattern matching systems, such as SQL _like_, where `'%'` means `'*'`. – Mr Lister Dec 08 '12 at 13:48
  • 4
    Two major flavors of regexp are POSIX and PCRE (Perl Compatible R.E.). The later is less long-winded and has some more features. Unix tools and shells generally use POSIX, most programming languages with built-in regexps (except shell) use PCRE. Just beware the difference when you are reading material on-line. – goldilocks Dec 08 '12 at 15:05
12

Answering to the question expressed in the original title:

Why do regular expressions differ from that used to filter files?

File name expansion predates regular expressions, already existed with most operating systems (wildcard/joker characters) and is much simpler and intuitive than the latter.

While *.txt is easily understandable by casual users, the analogous .*\.txt is something more targeted to experienced users/programmers, not to mention ^.*\.txt$ ...

jlliagre
  • 60,319
  • 10
  • 115
  • 157
  • 2
    Another reason for the “why” part: speed. Regular expressions are slower: http://pastebin.com/3iNCgkE3 – manatwork Jan 01 '13 at 11:21
  • 3
    `*.txt` doesn't equal `.*\.txt`, it (mostly) equals `.*\.txt$` because there can be nothing after the `.txt` (at least assuming *reasonable* file name globbing). Perhaps even `^.*\.txt$` somewhat depending on usage. Proves your point? – user Jan 09 '13 at 08:17