[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: lua for unicode
- From: lua+Steven.Murdoch@...
- Date: Tue, 03 Dec 2002 11:52:36 +0000
> > Initially Unicode was limited to 2^16 positions (65,536), but this was found
> > to be inadequate. The first 2^16 characters of Unicode are known as the Basic
> > Multilingual Plane (BMP) and is intended be enough to represent all living
> > languages, however as other messages have suggested it does not contain
> > historical characters. This space is not yet full so there may be further
> > characters added in the future.
> I find it amazing that this space it not yet
> full, as it should already have been filled with those 80000 characters
> I mentioned before alone.
The unassigned codespace of the BMP is a very scarce resource and as I
understand the proposals for its use far exceed its capacity. Given that
Unicode guarantees not to delete any characters once they are added, any
mistakes made could have very bad consequences and be impossible to rectify.
Also standards organizations move very, very slowly (sometimes this is a good
thing, other times it is not).
I don't think there is any conspiracy at work here, while you may think that
those characters are very important, there are other organizations which
believe others are more important and everyones views have to be considered.
The area outside of the BMP is quite sparsely populated so there will be less
work trying to get characters added to this area, however characters here
require more space to store (4 octets in UTF-16/UTF-8 as opposed to 1-3 octets
in UTF-8 and 2 octets in UTF-16) so there is a desire to move the "more
important" characters into the BMP.
> I still get the feeling that Microsoft
> wants to keep using it's obsolete 16-bit encoding (wich is AFAIK not
> UTF-16), and therefore is holding back many characters.
Microsoft has very little influence in this matter (it is a big group),
moreover there would be no advantage to them of preventing useful characters
being added to the BMP, since this is the only subset they support.