lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Tim Mensch <tim-lua-l@bitgems.com> writes:
>> But your item [2] really kills all of these ideas.  If we can't have
>> ustr:match, we may as well compile Lua with 16-bit Unicode strings
>> if our locale is fundamentally non-ASCII.
>
> Yuck. I would suggest that 16-bit Unicode was NEVER a good idea. Not
> even counting combining characters, you can't even fit all of the
> Unicode code points in 16-bits (over 110,000 now [1]), so some of them
> take two words to store ("surrogate pairs"). This means that you can't
> reliably index a UTF-16 string using offsets, and direct indexing of
> characters is the only argument I've heard in favor of UTF-16.

Yup, UTF-16 is an awful, awful idea.  It has basically no advantages
over UTF-8, and a fair number of significant disadvantages.

-Miles

-- 
The secret to creativity is knowing how to hide your sources.
  --Albert Einstein