lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

On 12/7/06, Roberto Ierusalimschy <> wrote:
If I understand correctly, even asian languages use ascii punctuation
(dots, spaces, newlines, commas, etc.), which uses 1 byte in utf-8 but 2
in utf-16. So, even for these languages utf-8 it is not so less compact
as it seems.

I don't know about other Asian languages but Japanese has special
punctuation characters.  There is even a wide character for space.
Here are some of them with their ASCII equivalents; I hope your mil
reader groks them.

. = 。
, = 、
" " = 「 」 (note the wide space within the Japanese-style quotes)

I believe newline is the same in Japanese character sets as it is in
ASCII and I presume this extends into UTF-8.

However, as some of the other readers have pointed out, many of the
multibyte characters express denser ideas so the ideas per byte is
probably not too much different from European languages.  Here are
some characters the Japanese use frequently with their English
equivalents.  I have chosen non-sino characters to try to make my
point more relevant to the English speaking readership.

☎ or ℡ = Tel (when listing telephone numbers)
a 〜 b = a to b or from a to b

 Ken Smith