4

I have some html like this:

  &#x250c&#x2500&#x2500&#x2500&#x2500&#x2500&#x2500&#x2500&#x2500
  &#x2500&#x2500&#x2500&#x2500&#x2500&#x2500&#x2500&#x2500&#x2510<br>
  &#x2502testtesttesttest&#x2502<br>
  &#x2514&#x2500&#x2500&#x2500&#x2500&#x2500&#x2500&#x2500&#x2500
  &#x2500&#x2500&#x2500&#x2500&#x2500&#x2500&#x2500&#x2500&#x2518<br>

It shows up in Chrome with a solid box as I would expect (can't get it to display in SE right either!):

    ┌────────────────┐
    │testtesttesttest│
    └────────────────┘

and I was hoping the text browsers could do this too, but on Lynx I get

    +----------------+
    |testtesttesttest|
    +----------------+

On w3m its

    ??????????????????
    ?testtesttesttest?
    ??????????????????

and finally on links2 I get

   +----------------+
   &#x2502|testtesttesttest|
   +----------------+

Any chance of configuring one of the text browsers to show this stuff with the pretty solid lines like the graphical browsers? I am using PuTTY set to UTF-8 with "use unicode line drawing" enabled, connecting to Ubuntu 12.04.

Cory J
  • 221
  • 2
  • 6
  • 1
    Always end [entities](http://en.wikipedia.org/wiki/Unicode_and_HTML#Numeric_character_references) with a semicolon, e.g. `│`. – michas Nov 24 '13 at 02:04
  • @michas thanks! I probably don't have to tell you that I'm not really a web guy. – Cory J Nov 24 '13 at 04:40

5 Answers5

4

elinks

If I understand your question then I believe elinks supports this feature. Using the UTF-8-demo.txt that @michas provided in his answer, here's a screenshot of elinks viewing that page.

Example

Invoking elinks like so:

$ elinks http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt

Here's a screenshot of the terminal running elinks:

   ss of elinks

w3m

As an alternative to elinks you can also use w3m.

Example

You can invoke w3m like so:

$ w3m http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt

Here's a screenshot of the terminal running w3m:

   ss of w3m

lynx

Lynx also supports this capability. You can invoke it like so:

$ lynx http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt

Here's a screenshot of the terminal running lynx:

   ss of lynx

Locale

All the terminal based browsers I know of work just fine in rendering these characters. My locale is set as follows:

$ locale
LANG=en_US.utf8
LC_CTYPE="en_US.utf8"
LC_NUMERIC="en_US.utf8"
LC_TIME="en_US.utf8"
LC_COLLATE="en_US.utf8"
LC_MONETARY="en_US.utf8"
LC_MESSAGES="en_US.utf8"
LC_PAPER="en_US.utf8"
LC_NAME="en_US.utf8"
LC_ADDRESS="en_US.utf8"
LC_TELEPHONE="en_US.utf8"
LC_MEASUREMENT="en_US.utf8"
LC_IDENTIFICATION="en_US.utf8"
LC_ALL=

Which is likely the issue.

References

slm
  • 363,520
  • 117
  • 767
  • 871
1

To manually test the capabilities of your terminal you can use a file like UTF-8-demo.txt.

Is your terminal able to display your boxes?

If it is able, does your browser know that your terminal is able to do so?

Otherwise the browser will take the safe option and emulate boxes using ASCII characters.

What is the output of locale and echo $TERM? - Most probably your browser will evaluate those in order to determine the capabilities of your terminal.

michas
  • 21,190
  • 4
  • 63
  • 93
  • 2
    Cool UTF demo but how does this actually answer the question? What _should_ `locale` and `$TERM` be? How do you tell a browser that the terminal can display UTF8? – terdon Nov 24 '13 at 02:51
  • I tried to point out that setting up your putty correctly is only half the way. That UTF8 file is just to make sure your terminal is able to display UTF8 characters correctly. - The second part is telling your browser that UTF-8 is available, which is likely determined by the given settings. – michas Nov 24 '13 at 05:16
1

After getting some hints here, it seems that the answer is that lynx, elinks, and w3m, all work if the locate is configured correctly.

locale

revealed that everything was set to "POSIX".

export LC_ALL="en_US.UTF-8"

fixed the problem. Added it to ~/.bashrc so that the change persists.

Thanks folks!

Cory J
  • 221
  • 2
  • 6
  • Also for anyone else like me using PuTTY/etc. on Windows, use the DejaVu Sans Mono font for best results. – Cory J Nov 24 '13 at 04:51
  • The locale category you need to set is `LC_CTYPE`, you shouldn't mess with `LC_ALL`. You shouldn't put this in `.bashrc`, it'll mess up when you use a non-UTF-8 terminal. Usually `LC_CTYPE` is set automatically, a common reason for it not working is that some user script blindly overrides it. If you're using PuTTY, make sure it's set up to use UTF-8 and declare it properly. – Gilles 'SO- stop being evil' Nov 24 '13 at 21:02
1

The given answers are close: locale settings are the basis for each of the programs (lynx, w3m, elinks) to decide how to render things.

There are a few points of disagreement though:

The lynx behavior depends also on the locale_charset setting, e.g., in the lynx.cfg file:

Description

LOCALE_CHARSET overrides CHARACTER_SET if true, using the current locale to lookup a MIME name that corresponds, and use that as the display charset.

It also modifies the default value for ASSUMED_CHARSET; it does not override that setting.

Note that while nl_langinfo(CODESET) itself is standardized, the return values and their relationship to the locale value is not. GNU libiconv happens to give useful values, but other implementations are not guaranteed to do this.

Default value

LOCALE_CHARSET:FALSE

Packagers customize this setting; the original sources are more conservative with regard to default settings.

The display of the given examples with lynx and w3m is very similar. On the other hand, if one scrolls down in the file with elinks, a problem with the combining characters used in the Thai example is immediately apparent. It seems that elinks does not handle that case:

enter image description here

Compare with w3m:

enter image description here

and with lynx (using ncursesw, of course):

enter image description here

These are all current executables from Debian/testing.

In short, while the different browsers can render Unicode box-characters, as asked, their capabilities (and configurability) differ.

Thomas Dickey
  • 75,040
  • 9
  • 171
  • 268
0

its a character set issue. If your terminal is in utf8 mode and lynx knows it, it just works. verified 30 sec ago.

hildred
  • 5,759
  • 3
  • 30
  • 43