While we use * to denote zero or more previous characters in grep, we use *.c to find all C files when we use it with the ls command like ls *.c. Could someone tell how the use of * differs in these two cases?
2 Answers
Shell file name globbing and regular expressions use some of the same characters, and they have similar purposes, but you're right, they aren't compatible. File name globbing is a much less powerful system.
In file name globbing:
*means "zero or more characters"?means "any single character"
But in regexes, you have to use .* to mean "zero or more characters", and . means "any single character." A ? means something quite different in regexes: zero or one instance of the preceding RE element.
Square brackets ([]) appear to work the same in both systems on the system I'm typing this on, for simple cases at least. This includes things like POSIX character classes (e.g. [:alpha:]). That said, if you need your commands to work on many different system types, I recommend against using anything beyond elementary things like lists of characters (e.g. [abeq]) and maybe character ranges (e.g. [a-c]).
These differences mean the two systems are only directly interchangeable for simple cases. If you need regex matching of file names, you need to do it another way. find -regex is one option. (Notice that there is also find -name, by the way, which uses glob syntax.)
- 71,107
- 16
- 178
- 168
-
2I don't know it was called globbing :) – user3539 Dec 08 '12 at 13:07
-
3In addition, there are various flavours of regex. Not all regexes are created the same! And you have many other pattern matching systems, such as SQL _like_, where `'%'` means `'*'`. – Mr Lister Dec 08 '12 at 13:48
-
4Two major flavors of regexp are POSIX and PCRE (Perl Compatible R.E.). The later is less long-winded and has some more features. Unix tools and shells generally use POSIX, most programming languages with built-in regexps (except shell) use PCRE. Just beware the difference when you are reading material on-line. – goldilocks Dec 08 '12 at 15:05
Answering to the question expressed in the original title:
Why do regular expressions differ from that used to filter files?
File name expansion predates regular expressions, already existed with most operating systems (wildcard/joker characters) and is much simpler and intuitive than the latter.
While *.txt is easily understandable by casual users, the analogous .*\.txt is something more targeted to experienced users/programmers, not to mention ^.*\.txt$ ...
- 60,319
- 10
- 115
- 157
-
2Another reason for the “why” part: speed. Regular expressions are slower: http://pastebin.com/3iNCgkE3 – manatwork Jan 01 '13 at 11:21
-
3`*.txt` doesn't equal `.*\.txt`, it (mostly) equals `.*\.txt$` because there can be nothing after the `.txt` (at least assuming *reasonable* file name globbing). Perhaps even `^.*\.txt$` somewhat depending on usage. Proves your point? – user Jan 09 '13 at 08:17