21

Sometimes, I'd like to know the name of a glyph. For example, if I see , I may want to know if it's a hyphen -, an en-dash , an em-dash , or a minus symbol . Is there a way that I can copy-paste this into a terminal to see what it is?

I am unsure if my system knows the common names to these glyphs, but there is certainly some (partial) information available, such as in /usr/share/X11/locale/en_US.UTF-8/Compose. For example,

<Multi_key> <exclam> <question>         : "‽"   U203D # INTERROBANG

Another example glyph: .

Sparhawk
  • 19,561
  • 18
  • 86
  • 152

5 Answers5

30

Try the unicode utility:

$ unicode ‽
U+203D INTERROBANG
UTF-8: e2 80 bd  UTF-16BE: 203d  Decimal: &#8253;
‽
Category: Po (Punctuation, Other)
Bidi: ON (Other Neutrals)

Or the uconv utility from the ICU package:

$ printf %s ‽ | uconv -x any-name
\N{INTERROBANG}

You can also get information via the recode utility:

$ printf %s ‽ | recode ..dump
UCS2   Mne   Description

203D         point exclarrogatif

Or with Perl:

$ printf %s ‽ | perl -CLS -Mcharnames=:full -lne 'print charnames::viacode(ord) for /./g'
INTERROBANG

Note that those give information on the characters that make-up that glyph, not on the glyph as a whole. For instance, for (e with combining acute accent):

$ printf é | uconv -x any-name
\N{LATIN SMALL LETTER E}\N{COMBINING ACUTE ACCENT}

Different from the standalone é character:

$ printf é | uconv -x any-name
\N{LATIN SMALL LETTER E WITH ACUTE}

You can ask uconv to recombine those (for those that have a combined form):

$ printf 'e\u0301b\u0301' | uconv -x '::nfc;::name;'
\N{LATIN SMALL LETTER E WITH ACUTE}\N{LATIN SMALL LETTER B}\N{COMBINING ACUTE ACCENT}

(é has a combined form, but not b́).

Gilles 'SO- stop being evil'
  • 807,993
  • 194
  • 1,674
  • 2,175
Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
  • What is `unicode`? I don't appear to have that installed (and can't find it in the Arch Linux repos). Also, what on earth is `exclarrogatif`? [EDIT: I get that here too, although my system is not French.] – Sparhawk Apr 27 '15 at 12:13
  • 2
    @Sparhawk, contraction of `exclamatif` and `interrogatif`. `recode` was written by a French-Canadian guy in the early 80s. – Stéphane Chazelas Apr 27 '15 at 12:17
  • @StéphaneChazelas: Does `L` need in `CLS`? It makes the answer wrong if something like `LC_ALL` was set to non UTF8 locale. – cuonglm Apr 27 '15 at 12:20
  • @cuonglm, if you have LC_ALL=C, you have no business entering characters other than ASCII ones. If your locale is for instance `LC_ALL=fr_FR.iso885915@euro` and you enter `echo é | perl...`, that é will be written as 0xe9, not UTF-8 encoding. And you want perl to tell you about that é, not about UTF-8 characters that won't be found in the input since the locale doesn't use that charset. Try `printf '\xe9' | _ALL=fr_FR.iso885915@euro perl...` for instance. – Stéphane Chazelas Apr 27 '15 at 12:24
  • @StéphaneChazelas: I think in the OP's question, he know the symbol, he wants symbol to name. Just copy the symbol to terminal and get the name back. When pasting `‽`, he want `INTERROBANG` instead of `LATIN SMALL LETTER A WITH CIRCUMFLEX` (when set LC_ALL=C). – cuonglm Apr 27 '15 at 12:33
  • @cuonglm, and what I'm saying is that if his locale is `LC_ALL=fr_FR.iso885915@euro` and he pastes `é` (0xe9 in that locale), `-CS` would give him the wrong answer. – Stéphane Chazelas Apr 27 '15 at 13:03
  • Ah, got it. But `-CLS` doesn't always give the right answer as I shown in my above comment, right? Do we have any work around? – cuonglm Apr 27 '15 at 13:05
  • @cuonglm, -CLS with LC_ALL=C should give you the right answer for all the valid characters in the C locale. é and ‽ are usually not present in the C locale, there's no way you can express them there. – Stéphane Chazelas Apr 27 '15 at 13:09
  • 2
    @Sparhawk http://kassiopeia.juls.savba.sk/~garabik/software/unicode/ — available as the `unicode` package on Debian, no idea about packaging on Arch. – Gilles 'SO- stop being evil' Apr 27 '15 at 13:31
  • Why printf instead of simply echo in the first some examples? – Paŭlo Ebermann Apr 27 '15 at 22:21
  • 1
    @PaŭloEbermann [Why is printf better than echo?](http://unix.stackexchange.com/q/65803). Now that you asked, you're expected to read the whole answer. There will be a test. – terdon Apr 27 '15 at 22:44
  • @terdon thanks for the link, I did read it all. – Paŭlo Ebermann Apr 27 '15 at 23:29
  • Slightly off-topic, but @StéphaneChazelas, what does the `%s` represent in some `printf` statements? – Sparhawk Apr 28 '15 at 00:51
  • 1
    @Sparhawk `%s` is like a placeholder, called a format specifier (or conversion specifier). printf will replace it with the succeeding arguments, treating it as a string (as opposed to a number, for example) (generally how you would expect with C's `printf()` function). See the docs (http://pubs.opengroup.org/onlinepubs/9699919799//basedefs/V1_chap05.html). – muru Apr 28 '15 at 05:57
  • @StéphaneChazelas: Well, perl6 seems to be better in this case. It detect the Unicode characters by its graphemes, its codepoints, its encoding's code units, or the bytes that make up the encoding. – cuonglm Dec 07 '15 at 03:59
6

You can use Perl viacode function from charnames module:

$ printf ‽ | perl -Mcharnames=:full -CLS -nle 'print charnames::viacode(ord)'
INTERROBANG
$ printf  | perl -Mcharnames=:full -CLS -nle 'print charnames::viacode(ord)'
COW

charnames was first released with perl v5.6.0


With Perl 6 will be production ready on this Christmas day, it's worth to mention it here, since when it has the best support for Unicode characters I have ever seen. You only need to call uniname method/routine:

$ printf ‽ | perl6 -ne 'say .uniname'
INTERROBANG

(e with combining acute accent) and standalone é character both give you:

# e with combining acute accent
$ printf é | perl6 -ne 'say .uniname'
LATIN SMALL LETTER E WITH ACUTE

# standalone é
$ printf é | perl6 -ne 'say .uniname'
LATIN SMALL LETTER E WITH ACUTE

(.uniname is the shorthand for $_.uniname)

cuonglm
  • 150,973
  • 38
  • 327
  • 406
5

The best way I know is through Perl's uniprops. It comes with Perl's Unicode::Tussle module. You can install it with

sudo perl -MCPAN -e 'install Unicode::Tussle'

You can then run it on any glyph you want to test:

$ uniprops  ‽
U+203D ‹‽› \N{INTERROBANG}
    \pP \p{Po}
    All Any Assigned InPunctuation Punct Is_Punctuation Common Zyyy Po P
       General_Punctuation Gr_Base Grapheme_Base Graph GrBase Other_Punctuation
       Pat_Syn Pattern_Syntax PatSyn Print Punctuation STerm Term
       Terminal_Punctuation Unicode X_POSIX_Graph X_POSIX_Print X_POSIX_Punct

$ uniprops  
U+1F404 ‹› \N{COW}
    \pS \p{So}
    All Any Assigned InMiscPictographs Common Zyyy So S Gr_Base Grapheme_Base Graph
       GrBase Misc_Pictographs Miscellaneous_Symbols_And_Pictographs Other_Symbol
       Print Symbol Unicode X_POSIX_Graph X_POSIX_Print
terdon
  • 234,489
  • 66
  • 447
  • 667
  • `uniprops` also uses charnames::viacode internally. – cuonglm Apr 27 '15 at 12:10
  • @cuonglm yes, but the Tussle module includes all sorts of fancy tools and `uniprops` is far, far easier to type than explicitly calling the module. It also provides more info than just the name. – terdon Apr 27 '15 at 12:12
4

You can use unicode, which also outputs some more information than just the name:

# unicode –
U+2013 EN DASH
UTF-8: e2 80 93  UTF-16BE: 2013  Decimal: &#8211;
–
Category: Pd (Punctuation, Dash)
Bidi: ON (Other Neutrals)
Marco
  • 33,188
  • 10
  • 112
  • 146
  • What is `unicode`? I don't appear to have that installed (and can't find it in the Arch Linux repos). – Sparhawk Apr 27 '15 at 12:14
  • 3
    @Sparhawk on my Debian, it's just a Python script installed by the `unicode` package. You should be able to get it by downloading the source package from the [Debian repos](https://packages.debian.org/jessie/unicode). – terdon Apr 27 '15 at 12:19
1

Create a bash script with this:

#!/bin/bash
awk -F ":" '{print $2}' /usr/share/X11/locale/en_US.UTF-8/Compose | grep "$1" | awk -F "#" '{print $2}'

Name it as you want, for example, namechar and give it executing permissions.

Now, you can call for example:

./namechar @

and the result will be:

COMMERCIAL AT
terdon
  • 234,489
  • 66
  • 447
  • 667
jcbermu
  • 4,626
  • 17
  • 26
  • This is good but only matches a susbset of characters, not full unicode. For example, it fails on ``, and produces repeated results for `€`. The last could be fixed by piping through `| sort -u`. – terdon Apr 27 '15 at 12:08
  • Yes, @terdon is correct. (That's why I said "partial" in the question.) This file only contains glyphs mapped to the `Compose` key. – Sparhawk Apr 27 '15 at 12:15