8

This is a homework question:

Match all filenames with 2 or more characters that start with a lower case letter, but do not end with an upper case letter.

I do not understand why my solution is not working.

So I executed the below:

touch aa
touch ha
touch ah
touch hh
touch a123e
touch hX
touch Ax

ls [a-z]*[!A-Z]

Output:

aa  ha

My question: Why did it not match "ah", "hh", or "a123e"?

  • Works for me properly under `mksh` shell, but not `bash --posix`, so there's gotta be some specific rule for bash` – Sergiy Kolodyazhnyy Jan 19 '17 at 00:32
  • @Serg, note that the behaviour for [A-Z] is unspecified by POSIX except in the C locale. `mksh` like `zsh`'s `[A-Z]` doesn't match on `É` for instance. ksh93's `[A-Z]` matches on `É` but not on `h`. – Stéphane Chazelas Jan 19 '17 at 10:38

1 Answers1

9

This is a locale problem. In your locale, [A-Z] expands to something like [AbBcZ...zZ] (plus probably others like accented characters), therefore [^A-Z] actually means "files that end with a" in your example (and only in your example).

If you want to avoid such a surprise, one way is to set LC_COLLATE=C since the collation is the part of your locale settings that is responsible of the sorting order. Also, empty LC_ALL if it is set, as it would take precedence.

$ ls [a-z]*[^A-Z]
aa  ha

$ ( LC_ALL=; LC_COLLATE=C; ls [a-z]*[^A-Z] )
a123e  aa  ah  ha  hh

Or, better, it's probably preferable to not change your locale settings and use the appropriate classes: [:lower:] instead of [a-z] and [:upper:] instead of [A-Z].

$ ls [[:lower:]]*[^[:upper:]]
a123e  aa  ah  ha  hh

Or use bash's globasciiranges option:

$ shopt -s globasciiranges
$ ls [a-z]*[^A-Z]
a123e  aa  ah  ha  hh

$ shopt -u globasciiranges
$ ls [a-z]*[^A-Z]
aa  ha
xhienne
  • 17,075
  • 2
  • 52
  • 68
  • @heemayl, no `LC_ALL=C ls [a-z]*[^A-Z]` would only affect `ls`'s locale, not the locale used by the shell to expand the glob or parse that command line. – Stéphane Chazelas Jan 19 '17 at 10:01
  • You don't need to export `LC_xxx` for it to apply to the glob, but it would be preferable so ls gets the same locale. – Stéphane Chazelas Jan 19 '17 at 10:10
  • 1
    Note that in a locale where the charset is GB18030 for instance, with the LC_ALL=C approach, it would fail to match on a file called `test-鏏` for instance because once you change the charset to that of the C locale, `鏏` becomes `<0xe7>A`. IOW, when changing LC_CTYPE, you're getting different characters. – Stéphane Chazelas Jan 19 '17 at 10:29
  • 1
    Note that I suspect [A-Z] in the OP's locale covers more than AbBcC...zZ. It probably also has `é`, `Á` (but probably not `Ź`). IOW, using `[A-Z]` makes little sense outside the C locale. – Stéphane Chazelas Jan 19 '17 at 10:31
  • @StéphaneChazelas Thank you for you excellent feedback. Answer updated. I believe I took everything into account. – xhienne Jan 19 '17 at 12:14