Re: Re: question about Unicode

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Re: question about Unicode
From: "Ken Smith" <kgsmith@...>
Date: Thu, 7 Dec 2006 08:55:32 -0800

On 12/7/06, Roberto Ierusalimschy <roberto@inf.puc-rio.br> wrote:

If I understand correctly, even asian languages use ascii punctuation
(dots, spaces, newlines, commas, etc.), which uses 1 byte in utf-8 but 2
in utf-16. So, even for these languages utf-8 it is not so less compact
as it seems.


I don't know about other Asian languages but Japanese has special
punctuation characters.  There is even a wide character for space.
Here are some of them with their ASCII equivalents; I hope your mil
reader groks them.

. = 。
, = 、
" " = 「　」 (note the wide space within the Japanese-style quotes)

I believe newline is the same in Japanese character sets as it is in
ASCII and I presume this extends into UTF-8.

However, as some of the other readers have pointed out, many of the
multibyte characters express denser ideas so the ideas per byte is
probably not too much different from European languages.  Here are
some characters the Japanese use frequently with their English
equivalents.  I have chosen non-sino characters to try to make my
point more relevant to the English speaking readership.

☎ or ℡ = Tel (when listing telephone numbers)
a 〜 b = a to b or from a to b

 Ken Smith

Follow-Ups:
- Re: question about Unicode, Adrian Perez

References:
- question about Unicode, Roberto Ierusalimschy
- Re: question about Unicode, Matt Campbell
- Re: question about Unicode, Roberto Ierusalimschy
- Re: question about Unicode, David Jones
- Re: question about Unicode, Roberto Ierusalimschy
- Re: question about Unicode, David Given
- Re: question about Unicode, Rici Lake
- Re: question about Unicode, Roberto Ierusalimschy

Prev by Date: Re: question about Unicode
Next by Date: Re: question about Unicode
Previous by thread: Re: question about Unicode
Next by thread: Re: question about Unicode
Index(es):
- Date
- Thread