Questions tagged [character-encoding]

Questions that deal with various representations of characters & character sets, such as: ASCII, UTF-8, EBCDIC, among others. Often encountered when moving files between operating systems that encode new lines with carriage returns and/or newline characters.

Use this tag when you know that you are dealing with characters or character sets that are represented differently.

A frequent issue is when a file (particularly one meant to be executed as a shell-script) is saved on a Microsoft Windows platform, then transferred to a Unix platform:

echo bytes to a file

I'm trying to connect my rasberry Pi to some display using the i2c bus. To get started I wanted to manually write stuff, bytes in particular to a file. How do you write specific bytes to a file? I already read that one and I figured my problem…

asked Mar 05 '14 at 13:03

Mark

1,149
3
9
18

votes

4 answers

How can I test the encoding of a text file... Is it valid, and what is it?

I have several .htm files which open in Gedit without any warning/error, but when I open these same files in Jedit, it warns me of invalid UTF-8 encoding... The HTML meta tag states "charset=ISO-8859-1". Jedit allows a List of fallback encodings and…

text-processing utilities character-encoding

asked Apr 19 '11 at 07:16

Peter.O

32,426
28
115
163

votes

2 answers

How can I set Vim's default encoding to UTF-8?

I'd like to contribute to an open source project by providing translated strings. One of their requirements is that contributors must use UTF-8 as the encoding for the PO files. I'm using Vim 7.3 on Linux. How can I be sure that Vim's encoding is…

vim character-encoding unicode

asked Oct 27 '11 at 11:16

Paolo

16,955
11
31
40

votes

4 answers

What is the ^M character called?

TexPad is creating it. I know that it is under some deadkey. I just cannot remember it is name. The blue character: I just want to mass remove them from my document. How can you type it?

character-encoding text

asked Jun 05 '14 at 17:16

Léo Léopold Hertz 준영

6,788
29
91
193

votes

6 answers

Filtering invalid utf8

I have a text file in an unknown or mixed encoding. I want to see the lines that contain a byte sequence that is not valid UTF-8 (by piping the text file into some program). Equivalently, I want to filter out the lines that are valid UTF-8. In other…

command-line text-processing character-encoding unicode

asked Jan 27 '11 at 00:13

Gilles 'SO- stop being evil'

807,993
194
1,674
2,175

votes

3 answers

What charset encoding is used for filenames and paths on Linux?

Does it depend on what file system I use? For example, ext2/ext3/ext4 but also what happens when I insert one of those "joliet" CD-ROMs with ISO 9660? I've heard that POSIX contains some sort of spec for the charset encoding of…

filenames character-encoding locale

asked Sep 15 '10 at 16:47

martin

votes

2 answers

tr complains of “Illegal byte sequence”

I'm brand new to UNIX and I am using Kirk McElhearn's "The Mac OS X Command Line" to teach myself some commands. I am attempting to use tr and grep so that I can search for text strings in a regular MS-Office Word Document. $ tr '\r' '\n' <…

text-processing grep character-encoding binary tr

asked Jul 08 '14 at 22:14

user74886

votes

5 answers

Converting a UTF-8 file to ASCII (best-effort)

I have a file in UTF-8 that contains texts in multiple languages. A lot of it are people's names. I need to convert it to ASCII and I need the result to look as decent as possible. There are many ways how to approach converting from a wider encoding…

character-encoding text natural-language

asked Dec 06 '14 at 16:53

user7610

1,878
2
18
22

votes

4 answers

How to specify characters using hexadecimal codes in `grep`?

I am using following command to grep character set range for hexadecimal code 0900 (instead of अ) to 097F (instead of व). How I can use hexadecimal code in place of अ and व? bzcat archive.bz2 | grep -v '<[अ-व]*\s' | tr '[:punct:][:blank:][:digit:]'…

shell grep character-encoding unicode

asked Aug 26 '11 at 06:03

Dhrubo Bhattacharjee

votes

4 answers

How to change encoding from Non-ISO extended-ASCII text, with CRLF line terminators to UTF-8?

I have a txt file : $ file -i x.txt x.txt: text/plain; charset=unknown-8bit $ file x.txt x.txt: Non-ISO extended-ASCII text, with CRLF line terminators And there are some characters that are incorrectly encoded : trwa³y, sta³y, usuwaæ How can…

character-encoding text

asked Jan 07 '14 at 19:17

Patryk

13,556
22
53
61

votes

4 answers

identify files with non-ASCII or non-printable characters in file name

In a directory size 80GB with approximately 700,000 files, there are some file names with non-English characters in the file name. Other than trawling through the file list laboriously is there: An easy way to list or otherwise identify these file…

bash shell find filenames character-encoding

asked Jan 17 '14 at 10:29

suspectus

5,890
4
20
26

votes

8 answers

How can I correctly decompress a ZIP archive of files with Hebrew names?

Someone sent me a ZIP file containing files with Hebrew names (and created on Windows, not sure with which tool). I use LXDE on Debian Stretch. The Gnome archive manager manages to unzip the file, but the Hebrew characters are garbled. I think I'm…

character-encoding zip unicode file-format

asked Dec 28 '15 at 17:47

einpoklum

8,772
19
65
129

votes

7 answers

Why do some characters show as squares in Chrome?

For example in the dev tools I get something like: Some of these squares are at the end of lines, initially I thought they were carriage returns but it turns out they aren't. Also, squares appear after = or > in many places where there is no…

arch-linux fonts character-encoding chrome

asked Apr 12 '12 at 09:16

Mat

votes

2 answers

find(1): how is the star wildcard implemented for it to fail on some filenames?

In a file system where filenames are in UTF-8, I have a file with a faulty name; it is displayed as: D�sinstaller, actual name according to zsh: D$'\351'sinstaller, Latin1 for Désinstaller, itself a French barbarism for "uninstall." Zsh would not…

shell find filenames wildcards character-encoding

asked Apr 09 '15 at 16:52

Michaël

votes

10 answers

How to print all printable ASCII chars in CLI?

How can I list all the printable ASCII characters in the terminal?

shell character-encoding

asked Jun 17 '11 at 06:37

LanceBaynes

39,295
97
250
349

2 3

…

26 27 Next