3

What characters match the following regex :

^[a-zA-Z]$

Specifically, should characters with accents (eg. á, è, ò) match this regex? Or just 26 lower and upper case alphabets? I have tried to check this in an online regex, and accented characters did not match.

When I tried it out with grep on my Linux (Ubuntu) machine it did match the accented characters. While rg(ripgrep) did not match it.

enter image description here

I can match accented characters with rg using [A-zÀ-ÿ]

But is there a way to exclude accented characters from matching in grep?

PS: This question talks about how accented chars can be included, not excluded.

adiSuper94
  • 31
  • 3
  • 2
    Standard text utilities like `grep` use the current locale. If your locale's collation order puts accented characters with the plain letters then they are included. `C` is a standard locale that does NOT collate accented letters together e.g. assuming your data is in UTF-8 (likely nowadays especially if it contains nonASCII) `(echo e; echo é) | LANG=C.UTF-8 grep '[a-z]'` – dave_thompson_085 Jul 03 '22 at 03:13

0 Answers0