[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Unicode and UTF-8 the Lua way, mid-discussion (was Re: What do you miss most in Lua)
- From: Dirk Laurie <dirk.laurie@...>
- Date: Wed, 8 Feb 2012 20:01:12 +0200
Op 8 februari 2012 17:18 schreef Jay Carlson <nop@nop.com> het volgende:
[1]: Why yes, if UTF-8 processing is how we do Unicode processing, and
we don't have the character property tables, we've reduced this to a
trivial case of the whole "strings have types; will your language help
you?" question. It's just a very simple language.
[2]: Patterns look very difficult to fix up on the Lua side though.
I think we are all agreed that some sort of UTF8 support in Lua is desirable if not essential. The question is: how?
(1) Additional functions in "string" library, e.g. str:usub(3,6) extracts UTF8 characters 3 to 6 and throws an error if str is not valid UTF8. Pro: simplest. Con: requires a change in 'official' Lua, can't genuinely start mid-string.
(2) Another standard library, say "ustring", with functions like "string" but UTF8 semantics, say ustring.sub(str,3,6). Pro: can be implemented as a third-party library with no change to 'official' Lua. Con: like (1), also no object oriented calls.
(3) Another standard library, say "utf8", but operating on userdata, e.g. ustr:sub(3,6). ustr:type() is 'utf8'. Creates a private code point address list. Pro: avoids cons of (1) and (2). Con: requires conversion to-from string.
But your item [2] really kills all of these ideas. If we can't have ustr:match, we may as well compile Lua with 16-bit Unicode strings if our locale is fundamentally non-ASCII.