-1

I was trying to find files in a certain directory which don't adhere to the naming guidelines for UNIX-like systems.

When using with find the command find <dir> -regex '.*[^-_./0-9a-zA-Z].*' returns the files of interest.

my question with above command line is:

  1. Why did we need the any one character metacharacter . before the zero or more * metacharacter at the start and end of the regex respectively for this to work as intended. when i initially tried with find <dir> -regex '*[^-_./0-9a-zA-Z]*' that returned nothing.
  2. Furthermore, if I replace the character ranges in the regex with their corresponding POSIX character classes with everything else intact: find <dir> -regex '.*[^-_./[:digit:][:lower:][:upper:]].*' it returns nothing. why is it this way?

TIA!

ilkkachu
  • 133,243
  • 15
  • 236
  • 397
computronium
  • 778
  • 6
  • 15
  • 1
    Have you read the description of `-regex`? It matches over the whole path. – muru Aug 08 '20 at 09:57
  • and what about that? – computronium Aug 08 '20 at 10:58
  • If you exclude `/`, the path separator, in what system do you think your regex will match a whole path? – muru Aug 08 '20 at 12:01
  • @muru Where am I excluding the path separator? the regex doesn't exclude the path separator, It is supposed to find files which have characters **other than the recommended characters:** - or _ or . or / or digits or lowercase letters or uppercase letters... So, to answer you question I think this will work on **all** UNIX-like systems. Besides, that wasn't my question?! – computronium Aug 08 '20 at 13:40
  • 1
    If you don't include `.` which matches anything, and explicitly exclude `/`, which is the path separator, of course you're excluding the path separator. I have no idea what your random bolding is meant to imply. – muru Aug 08 '20 at 13:52
  • 1
    @muru if i don't include `.` and don't explicitly exclude the `/`, the resulting regex `*[^-_.0-9a-zA-Z]*` won't work then either, will it? `.` has to be included not just because of the path separator. My misconception here was cleared pretty nicely by @steeldriver. The regex won't work without the leading and trailing `.` because then the quantifier `*` isn't specifying what it wants to have none or more of. It was a misconception because I was mixing up the shell wildcard `*` with the regex quantifier `*`... – computronium Aug 08 '20 at 15:14
  • 1
    ...which @steeldriver intuited out pretty well. You on the other hand, just kept nitpicking on one thing that was not relevant to my question, IMO. So, apologies if you found my **random bolding** off-putting. – computronium Aug 08 '20 at 15:14
  • 2
    Does this answer your question? [How do regular expressions differ from wildcards used to filter files](https://unix.stackexchange.com/questions/57957/how-do-regular-expressions-differ-from-wildcards-used-to-filter-files) – muru Aug 08 '20 at 16:05
  • @muru, the command looks for filenames that contain anything other _but_ the listed characters. Like, say, a comma, or a dollar sign or ... If the slash wasn't listed, it would match pretty much everything (since `-regex` matches against the full path) – ilkkachu Aug 08 '20 at 16:44
  • I suppose the alternative would be `find . ! -regex '[-_./0-9a-zA-Z]*'`, if that's any clearer – ilkkachu Aug 08 '20 at 16:58
  • @muru yes, it does answer the first question. steeldriver has answered both. – computronium Aug 10 '20 at 07:24

1 Answers1

3
  1. * in regular expression syntax is a quantifier applying to the previous regex atom (in this case, .). It is not itself a "zero or more metacharacter" as it would be in shell pattern matching syntax (aka "globbing").

  2. may be an idiocyncracy of the default Emacs regextype - try -regextype posix-basic or -regextype egrep for example if you want more familiar behavior.

steeldriver
  • 78,509
  • 12
  • 109
  • 152