lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Op 8 februari 2012 17:18 schreef Jay Carlson <nop@nop.com> het volgende:

[1]: Why yes, if UTF-8 processing is how we do Unicode processing, and
we don't have the character property tables, we've reduced this to a
trivial case of the whole "strings have types; will your language help
you?" question. It's just a very simple language.

[2]: Patterns look very difficult to fix up on the Lua side though.

I think we are all agreed that some sort of UTF8 support in Lua is desirable if not essential.  The question is: how?

(1) Additional functions in "string" library, e.g. str:usub(3,6) extracts UTF8 characters 3 to 6 and throws an error if str is not valid UTF8.  Pro: simplest.  Con: requires a change in 'official' Lua, can't genuinely start mid-string.
(2) Another standard library, say "ustring", with functions like "string" but UTF8 semantics, say ustring.sub(str,3,6).  Pro: can be implemented as a third-party library with no change to 'official' Lua.  Con: like (1), also no object oriented calls.
(3) Another standard library, say "utf8", but operating on userdata, e.g. ustr:sub(3,6).  ustr:type() is 'utf8'.  Creates a private code point address list.  Pro: avoids cons of (1) and (2).  Con: requires conversion to-from string.

But your item [2] really kills all of these ideas.  If we can't have  ustr:match, we may as well compile Lua with 16-bit Unicode strings if our locale is fundamentally non-ASCII.