[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Unicode and UTF-8 the Lua way, mid-discussion (was Re: What do you miss most in Lua)
- From: Miles Bader <miles@...>
- Date: Thu, 09 Feb 2012 04:46:19 +0900
Tim Mensch <tim-lua-l@bitgems.com> writes:
>> But your item [2] really kills all of these ideas. If we can't have
>> ustr:match, we may as well compile Lua with 16-bit Unicode strings
>> if our locale is fundamentally non-ASCII.
>
> Yuck. I would suggest that 16-bit Unicode was NEVER a good idea. Not
> even counting combining characters, you can't even fit all of the
> Unicode code points in 16-bits (over 110,000 now [1]), so some of them
> take two words to store ("surrogate pairs"). This means that you can't
> reliably index a UTF-16 string using offsets, and direct indexing of
> characters is the only argument I've heard in favor of UTF-16.
Yup, UTF-16 is an awful, awful idea. It has basically no advantages
over UTF-8, and a fair number of significant disadvantages.
-Miles
--
The secret to creativity is knowing how to hide your sources.
--Albert Einstein