steve donovan wrote:
> BTW, anybody have experience with the Lua string library working with
> widechar strings? At least within the confines of the BMP they have a
> regular concept of 'character'.

Unfortunately not: sequences of combining characters mustn't be split. I
don't know if there's a maximum length for a grapheme cluster, but,
e.g., most Korean Hangul syllables are three code points long.

Doesn't somebody have a Lua library that will decompose a UTF-8 string
into an array of grapheme clusters? The rules to do so are well-defined,
and would probably be the simplest approach if you want to deal with

