I have the Unicode character ᚠ, represented by its Unicode code point 16A0, in a text file (the text file is encoded(?) as utf-8).
When I do grep '\u16A0' test.txt I get no result. How do I grep that character?
I have the Unicode character ᚠ, represented by its Unicode code point 16A0, in a text file (the text file is encoded(?) as utf-8).
When I do grep '\u16A0' test.txt I get no result. How do I grep that character?
You can use ANSI-C quoting provided by your shell, to replace backslash-escaped characters as specified by the ANSI C standard. This should work for any command, not just grep, in shells like Bash and Zsh:
grep $'\u16A0'
For some more complex examples, you might refer to this related question and its answers.
You could use ugrep as a drop-in replacement of grep to match Unicode code point U+16A0:
ugrep '\x{16A0}' test.txt
It takes the same options as grep but offers vastly more features, such as:
ugrep searches UTF-8/16/32 input and other formats. Option -Q permits many other file formats to be searched, such as ISO-8859-1 to 16, EBCDIC, code pages 437, 850, 858, 1250 to 1258, MacRoman, and KIO8.
ugrep matches Unicode patterns by default (disabled with option -U). The regular expression pattern syntax is POSIX ERE compliant extended with PCRE-like syntax. Option -P may also be used for Perl matching with Unicode patterns.
See ugrep on GitHub for details.