Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems.
Questions tagged [unicode]
476 questions
149
votes
11 answers
How can I remove the BOM from a UTF-8 file?
I have a file in UTF-8 encoding with BOM and want to remove the BOM. Are there any linux command-line tools to remove the BOM from the file?
$ file test.xml
test.xml: XML 1.0 document, UTF-8 Unicode (with BOM) text, with very long lines
m13r
- 2,635
- 2
- 17
- 14
94
votes
3 answers
Awesome symbols and characters in a bash prompt
I just ran across a screenshot of someone's terminal:
Is there a list of all of the characters which can be used in a Bash prompt, or can someone get me the character for the star and the right arrow?
Naftuli Kay
- 38,686
- 85
- 220
- 311
71
votes
2 answers
How can I set Vim's default encoding to UTF-8?
I'd like to contribute to an open source project by providing translated strings. One of their requirements is that contributors must use UTF-8 as the encoding for the PO files.
I'm using Vim 7.3 on Linux. How can I be sure that Vim's encoding is…
Paolo
- 16,955
- 11
- 31
- 40
61
votes
3 answers
Why is printf "shrinking" umlaut?
If I execute the following simple script:
#!/bin/bash
printf "%-20s %s\n" "Früchte und Gemüse" "foo"
printf "%-20s %s\n" "Milchprodukte" "bar"
printf "%-20s %s\n" "12345678901234567890" "baz"
It prints:
Früchte und Gemüse foo
Milchprodukte…
René Nyffenegger
- 2,201
- 2
- 23
- 28
59
votes
6 answers
Filtering invalid utf8
I have a text file in an unknown or mixed encoding. I want to see the lines that contain a byte sequence that is not valid UTF-8 (by piping the text file into some program). Equivalently, I want to filter out the lines that are valid UTF-8. In other…
Gilles 'SO- stop being evil'
- 807,993
- 194
- 1,674
- 2,175
46
votes
5 answers
Updated my arch linux server and now I get tmux: need UTF-8 locale (LC_CTYPE) but have ANSI_X3.4-1968
I recently updated my Arch Linux server and during that process tmux got updated. I was using tmux while the upgrade was going on and used it afterwards, but all during the same SSH session.
Now, however, whenever I try to issue any tmux command I…
RPiAwesomeness
- 980
- 2
- 8
- 10
46
votes
2 answers
What fonts are good for unicode glyphs
So I was looking at this answer on stackoverflow and realized that my fonts aren't covering a whole lot of the utf-8 unicode spectrum (as I get lots of squares). Does anyone know a font that will cover all of that post?
xenoterracide
- 57,918
- 74
- 184
- 250
43
votes
7 answers
Is there an alternative to sed that supports unicode?
For example:
sed 's/\u0091//g' file1
Right now, I have to do hexdump to get hex number and put into sed as follows:
$ echo -ne '\u9991' | hexdump -C
00000000 e9 a6 91 |...|
00000003
And then:
$ sed…
A-letubby
- 699
- 2
- 6
- 6
42
votes
2 answers
How to make tr aware of non-ascii(unicode) characters?
I'm trying to remove some characters from file(UTF-8). I'm using tr for this purpose:
tr -cs '[[:alpha:][:space:]]' ' '
MatthewRock
- 6,826
- 6
- 31
- 54
40
votes
4 answers
How to specify characters using hexadecimal codes in `grep`?
I am using following command to grep character set range for hexadecimal code 0900 (instead of अ) to 097F (instead of व). How I can use hexadecimal code in place of अ and व?
bzcat archive.bz2 | grep -v '<[अ-व]*\s' | tr '[:punct:][:blank:][:digit:]'…
Dhrubo Bhattacharjee
- 501
- 1
- 4
- 8
39
votes
4 answers
gitk crashes when viewing commit containing emoji: X Error of failed request: BadLength (poly request too large or internal Xlib length error)
I'm able to open gitk but it crashes as soon as I open a commit whom changes contains an emoji (not the commit message).
Error
❯ gitk --all
X Error of failed request: BadLength (poly request too large or internal Xlib length error)
Major opcode…
Édouard Lopez
- 1,282
- 12
- 23
38
votes
7 answers
Convert between Unicode Normalization Forms on the unix command-line
In Unicode, some character combinations have more than one representation.
For example, the character ä can be represented as
"ä", that is the codepoint U+00E4 (two bytes c3 a4 in UTF-8 encoding), or as
"ä", that is the two codepoints U+0061…
glts
- 572
- 1
- 4
- 12
37
votes
1 answer
Should we use UTF-8 characters like ⏰ in bash/shell script?
The simple code here is working as expected on my machine if launched with bash :
function ⏰(){
date
}
⏰
Could there be a problem for other people using this, or is it universal ?
I'm wondering because I've never seen anything like this in other…
bob dylan
- 1,832
- 3
- 20
- 31
37
votes
8 answers
How can I correctly decompress a ZIP archive of files with Hebrew names?
Someone sent me a ZIP file containing files with Hebrew names (and created on Windows, not sure with which tool). I use LXDE on Debian Stretch. The Gnome archive manager manages to unzip the file, but the Hebrew characters are garbled. I think I'm…
einpoklum
- 8,772
- 19
- 65
- 129
33
votes
4 answers
Find the best font for rendering a codepoint
How to find the appropriate font for rendering unicode codepoints ?
gnome-terminal find that characters like «⼼» can be rendered with fonts like Symbola rather than my terminal font or the codepoint-in-square fallback (). How ?
Nope
- 461
- 4
- 5