How can I find the common name for a particular glyph?

Question

Sometimes, I'd like to know the name of a glyph. For example, if I see −, I may want to know if it's a hyphen -, an en-dash –, an em-dash —, or a minus symbol −. Is there a way that I can copy-paste this into a terminal to see what it is?

I am unsure if my system knows the common names to these glyphs, but there is certainly some (partial) information available, such as in /usr/share/X11/locale/en_US.UTF-8/Compose. For example,

<Multi_key> <exclam> <question>         : "‽"   U203D # INTERROBANG

Another example glyph: .

score 30 · Accepted Answer · edited Apr 27 '15 at 13:30

30

Try the unicode utility:

$ unicode ‽
U+203D INTERROBANG
UTF-8: e2 80 bd  UTF-16BE: 203d  Decimal: &#8253;
‽
Category: Po (Punctuation, Other)
Bidi: ON (Other Neutrals)

Or the uconv utility from the ICU package:

$ printf %s ‽ | uconv -x any-name
\N{INTERROBANG}

You can also get information via the recode utility:

$ printf %s ‽ | recode ..dump
UCS2   Mne   Description

203D         point exclarrogatif

Or with Perl:

$ printf %s ‽ | perl -CLS -Mcharnames=:full -lne 'print charnames::viacode(ord) for /./g'
INTERROBANG

Note that those give information on the characters that make-up that glyph, not on the glyph as a whole. For instance, for é (e with combining acute accent):

$ printf é | uconv -x any-name
\N{LATIN SMALL LETTER E}\N{COMBINING ACUTE ACCENT}

Different from the standalone é character:

$ printf é | uconv -x any-name
\N{LATIN SMALL LETTER E WITH ACUTE}

You can ask uconv to recombine those (for those that have a combined form):

$ printf 'e\u0301b\u0301' | uconv -x '::nfc;::name;'
\N{LATIN SMALL LETTER E WITH ACUTE}\N{LATIN SMALL LETTER B}\N{COMBINING ACUTE ACCENT}

(é has a combined form, but not b́).

edited Apr 27 '15 at 13:30

Gilles 'SO- stop being evil'

807,993
194
1,674
2,175

answered Apr 27 '15 at 12:08

Stéphane Chazelas

522,931
91
1,010
1,501

What is `unicode`? I don't appear to have that installed (and can't find it in the Arch Linux repos). Also, what on earth is `exclarrogatif`? [EDIT: I get that here too, although my system is not French.] – Sparhawk Apr 27 '15 at 12:13
2

@Sparhawk, contraction of `exclamatif` and `interrogatif`. `recode` was written by a French-Canadian guy in the early 80s. – Stéphane Chazelas Apr 27 '15 at 12:17
@StéphaneChazelas: Does `L` need in `CLS`? It makes the answer wrong if something like `LC_ALL` was set to non UTF8 locale. – cuonglm Apr 27 '15 at 12:20
@cuonglm, if you have LC_ALL=C, you have no business entering characters other than ASCII ones. If your locale is for instance `LC_ALL=fr_FR.iso885915@euro` and you enter `echo é | perl...`, that é will be written as 0xe9, not UTF-8 encoding. And you want perl to tell you about that é, not about UTF-8 characters that won't be found in the input since the locale doesn't use that charset. Try `printf '\xe9' | _ALL=fr_FR.iso885915@euro perl...` for instance. – Stéphane Chazelas Apr 27 '15 at 12:24
@StéphaneChazelas: I think in the OP's question, he know the symbol, he wants symbol to name. Just copy the symbol to terminal and get the name back. When pasting `‽`, he want `INTERROBANG` instead of `LATIN SMALL LETTER A WITH CIRCUMFLEX` (when set LC_ALL=C). – cuonglm Apr 27 '15 at 12:33
@cuonglm, and what I'm saying is that if his locale is `LC_ALL=fr_FR.iso885915@euro` and he pastes `é` (0xe9 in that locale), `-CS` would give him the wrong answer. – Stéphane Chazelas Apr 27 '15 at 13:03
Ah, got it. But `-CLS` doesn't always give the right answer as I shown in my above comment, right? Do we have any work around? – cuonglm Apr 27 '15 at 13:05
@cuonglm, -CLS with LC_ALL=C should give you the right answer for all the valid characters in the C locale. é and ‽ are usually not present in the C locale, there's no way you can express them there. – Stéphane Chazelas Apr 27 '15 at 13:09
2

@Sparhawk http://kassiopeia.juls.savba.sk/~garabik/software/unicode/ — available as the `unicode` package on Debian, no idea about packaging on Arch. – Gilles 'SO- stop being evil' Apr 27 '15 at 13:31
Why printf instead of simply echo in the first some examples? – Paŭlo Ebermann Apr 27 '15 at 22:21
1

@PaŭloEbermann [Why is printf better than echo?](http://unix.stackexchange.com/q/65803). Now that you asked, you're expected to read the whole answer. There will be a test. – terdon Apr 27 '15 at 22:44
@terdon thanks for the link, I did read it all. – Paŭlo Ebermann Apr 27 '15 at 23:29
Slightly off-topic, but @StéphaneChazelas, what does the `%s` represent in some `printf` statements? – Sparhawk Apr 28 '15 at 00:51
1

@Sparhawk `%s` is like a placeholder, called a format specifier (or conversion specifier). printf will replace it with the succeeding arguments, treating it as a string (as opposed to a number, for example) (generally how you would expect with C's `printf()` function). See the docs (http://pubs.opengroup.org/onlinepubs/9699919799//basedefs/V1_chap05.html). – muru Apr 28 '15 at 05:57
@StéphaneChazelas: Well, perl6 seems to be better in this case. It detect the Unicode characters by its graphemes, its codepoints, its encoding's code units, or the bytes that make up the encoding. – cuonglm Dec 07 '15 at 03:59

cuonglm · Answer 2 · 2015-12-07T04:39:30.273

You can use Perl viacode function from charnames module:

$ printf ‽ | perl -Mcharnames=:full -CLS -nle 'print charnames::viacode(ord)'
INTERROBANG
$ printf  | perl -Mcharnames=:full -CLS -nle 'print charnames::viacode(ord)'
COW

charnames was first released with perl v5.6.0

With Perl 6 will be production ready on this Christmas day, it's worth to mention it here, since when it has the best support for Unicode characters I have ever seen. You only need to call uniname method/routine:

$ printf ‽ | perl6 -ne 'say .uniname'
INTERROBANG

é (e with combining acute accent) and standalone é character both give you:

# e with combining acute accent
$ printf é | perl6 -ne 'say .uniname'
LATIN SMALL LETTER E WITH ACUTE

# standalone é
$ printf é | perl6 -ne 'say .uniname'
LATIN SMALL LETTER E WITH ACUTE

(.uniname is the shorthand for $_.uniname)

score 5 · Answer 3 · answered Apr 27 '15 at 12:01

The best way I know is through Perl's uniprops. It comes with Perl's Unicode::Tussle module. You can install it with

sudo perl -MCPAN -e 'install Unicode::Tussle'

You can then run it on any glyph you want to test:

$ uniprops  ‽
U+203D ‹‽› \N{INTERROBANG}
    \pP \p{Po}
    All Any Assigned InPunctuation Punct Is_Punctuation Common Zyyy Po P
       General_Punctuation Gr_Base Grapheme_Base Graph GrBase Other_Punctuation
       Pat_Syn Pattern_Syntax PatSyn Print Punctuation STerm Term
       Terminal_Punctuation Unicode X_POSIX_Graph X_POSIX_Print X_POSIX_Punct

$ uniprops  
U+1F404 ‹› \N{COW}
    \pS \p{So}
    All Any Assigned InMiscPictographs Common Zyyy So S Gr_Base Grapheme_Base Graph
       GrBase Misc_Pictographs Miscellaneous_Symbols_And_Pictographs Other_Symbol
       Print Symbol Unicode X_POSIX_Graph X_POSIX_Print

@cuonglm yes, but the Tussle module includes all sorts of fancy tools and `uniprops` is far, far easier to type than explicitly calling the module. It also provides more info than just the name. — terdon, Apr 27 '15 at 12:12

score 4 · Answer 4 · answered Apr 27 '15 at 12:10

4

You can use unicode, which also outputs some more information than just the name:

# unicode –
U+2013 EN DASH
UTF-8: e2 80 93  UTF-16BE: 2013  Decimal: &#8211;
–
Category: Pd (Punctuation, Dash)
Bidi: ON (Other Neutrals)

answered Apr 27 '15 at 12:10

Marco

33,188
10
112
146

What is `unicode`? I don't appear to have that installed (and can't find it in the Arch Linux repos). – Sparhawk Apr 27 '15 at 12:14
3

@Sparhawk on my Debian, it's just a Python script installed by the `unicode` package. You should be able to get it by downloading the source package from the [Debian repos](https://packages.debian.org/jessie/unicode). – terdon Apr 27 '15 at 12:19

score 1 · Answer 5 · edited Apr 27 '15 at 12:04

1

Create a bash script with this:

#!/bin/bash
awk -F ":" '{print $2}' /usr/share/X11/locale/en_US.UTF-8/Compose | grep "$1" | awk -F "#" '{print $2}'

Name it as you want, for example, namechar and give it executing permissions.

Now, you can call for example:

./namechar @

and the result will be:

COMMERCIAL AT

edited Apr 27 '15 at 12:04

terdon

234,489
66
447
667

answered Apr 27 '15 at 12:02

jcbermu

4,626
17
26

This is good but only matches a susbset of characters, not full unicode. For example, it fails on ``, and produces repeated results for `€`. The last could be fixed by piping through `| sort -u`. – terdon Apr 27 '15 at 12:08
Yes, @terdon is correct. (That's why I said "partial" in the question.) This file only contains glyphs mapped to the `Compose` key. – Sparhawk Apr 27 '15 at 12:15

How can I find the common name for a particular glyph?

5 Answers5

Linked