Re: newbie - Lua and unicode

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: newbie - Lua and unicode
From: William Ahern <wahern@...>
Date: Thu, 14 Sep 2006 13:56:52 -0700

On Thu, 2006-09-14 at 13:30 -0700, William Ahern wrote:
> Here's where the big gotcha comes with Unicode. A code point does not
> equal a "character". In unicode you can compose "characters" (aka
> graphemes), using multiple codepoint entities. An a+umlaut, even though
> it's a latin1 character in the older ISO standards, can be represented
> by one or three 16-bit codepoint values.
> 

Actually, there are three ways to represent this on screen, and they're
equivalency is dependent on the application and usage. If I was scanning
logs visually and grepping for a+umlaut, I'd probably want my search key
to match all of these:

1) U+00E4
2) U+0061 U+0308
3) U+0061 U+034F U+0308

These examples are valid in both UCS-2 and UTF-16.

-- 
William Ahern <wahern@barracudanetworks.com>

--------------------------------------------------
This message was scanned for Spam, Spyware and Viruses
For more information, please visit:
http://www.barracudanetworks.com

References:
- newbie - Lua and unicode, Theodor-Iulian Ciobanu
- Re: newbie - Lua and unicode, Lisa Parratt
- Re[2]: newbie - Lua and unicode, Theodor-Iulian Ciobanu
- Re: newbie - Lua and unicode, Javier Guerra
- Re: newbie - Lua and unicode, Klaus Ripke
- Re: newbie - Lua and unicode, William Ahern

Prev by Date: Re: default values of 'false' in data description (Common Lua -vs- Schlua)
Next by Date: JavaScript gets bigger
Previous by thread: Re: newbie - Lua and unicode
Next by thread: Re: newbie - Lua and unicode
Index(es):
- Date
- Thread