lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


steve donovan wrote:
[...]
> BTW, anybody have experience with the Lua string library working with
> widechar strings? At least within the confines of the BMP they have a
> regular concept of 'character'.

Unfortunately not: sequences of combining characters mustn't be split. I
don't know if there's a maximum length for a grapheme cluster, but,
e.g., most Korean Hangul syllables are three code points long.

Doesn't somebody have a Lua library that will decompose a UTF-8 string
into an array of grapheme clusters? The rules to do so are well-defined,
and would probably be the simplest approach if you want to deal with
'characters'.

-- 
┌─── dg@cowlark.com ───── http://www.cowlark.com ─────
│ "I have always wished for my computer to be as easy to use as my
│ telephone; my wish has come true because I can no longer figure out
│ how to use my telephone." --- Bjarne Stroustrup

Attachment: signature.asc
Description: OpenPGP digital signature