How to grep characters with their unicode value?

Question

I have the Unicode character ᚠ, represented by its Unicode code point 16A0, in a text file (the text file is encoded(?) as utf-8).

When I do grep '\u16A0' test.txt I get no result. How do I grep that character?

@pLumo Yes, that worked. Thanks. What does the $ character _before_ the regular expression do? — Stupid, Jun 06 '19 at 14:39

score 28 · Accepted Answer · edited Nov 07 '22 at 15:58

28

You can use ANSI-C quoting provided by your shell, to replace backslash-escaped characters as specified by the ANSI C standard. This should work for any command, not just grep, in shells like Bash and Zsh:

grep $'\u16A0'

For some more complex examples, you might refer to this related question and its answers.

edited Nov 07 '22 at 15:58

Flimm

3,970
7
28
36

answered Jun 06 '19 at 14:52

pLumo

22,231
2
41
66

2

Note that it's not ANSI C, the C language standard does not specify functionality of shells, and it was invented by David Korn for the Korn shell. https://unix.stackexchange.com/a/65819/5132 – JdeBP Jun 06 '19 at 18:40
1

except this ruins the regex if there is a control character – törzsmókus Jul 07 '21 at 08:59

score 6 · Answer 2 · answered Jan 13 '20 at 21:04

You could use ugrep as a drop-in replacement of grep to match Unicode code point U+16A0:

ugrep '\x{16A0}' test.txt

It takes the same options as grep but offers vastly more features, such as:

ugrep searches UTF-8/16/32 input and other formats. Option -Q permits many other file formats to be searched, such as ISO-8859-1 to 16, EBCDIC, code pages 437, 850, 858, 1250 to 1258, MacRoman, and KIO8.

ugrep matches Unicode patterns by default (disabled with option -U). The regular expression pattern syntax is POSIX ERE compliant extended with PCRE-like syntax. Option -P may also be used for Perl matching with Unicode patterns.

See ugrep on GitHub for details.

How to grep characters with their unicode value?

2 Answers2

Linked