[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: lua for unicode
- From: Björn De Meyer <bjorn.demeyer@...>
- Date: Sat, 30 Nov 2002 18:35:56 +0100
John Belmonte wrote:
>
> This seems like misinformation. From what I understand, unicode has a
> 31 bit space, and only about 21 bits of that is required to cover all
> characters in use today. As for using unicode for Japanese, certainly
> it is possible and works well, as I've personally deployed unicode with
> UTF-8 encoding at Japanese websites.
>
> Perhaps your experience is with some naive *encoding* of unicode that
> tries to stuff 21 bits into 16? ;-)
>
Yes, I do see that point. However, Unicode started out as
being just that 16 bit wide encoding that lmicrosoft still uses
these days. So, historically speaking, unicode laden with the
stigma of being too restrictive.
The other more important problem wich I mention here is CJK
unification. If I am not mistaken, even in Unicode of these days,
many Chinese, Japanese and Korean ideographs have not been
included on grounds of being historical forms, being
"too similar" with other ideographs, on grounds of being
uncommon in usage, or on grounds of being writable by
other characters.
Think about it. That's like saying that the q is not needed
in the roman alphabet because it's so similar to an o (just
one extra line), not used very commonly, and besides we could
replace all q's by, for instance "kw". The kwestion is of
course whether we would agree to such an intrusion into our
westenr languages. And people named Quinten might object
to having to write their name as Kwinten.
I have heard from some Japanese people that they find Unicode
culturally unacceptable exactly for these reasons. Maybe something
has changred at the Unicode consortium, but I can't forget it's
still Microsoft's brainchild, so I am weary of it.
95000 characters are now in unicode, but to my estimate,
there are probably a million different characters
in all human languages of the past and the present. Lacking
are historical forms, rare scripts, reginal variants, etc.
Maybe I have been misinformed. Maybe someone will be able to
reassure me with regards to my doubts.
More importantly, I am still interested in doing a
small uft8-lib for Lua.
--
"No one knows true heroes, for they speak not of their greatness." --
Daniel Remar.
Björn De Meyer
bjorn.demeyer@pandora.be