lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Roberto Ierusalimschy wrote:
>> Do you plan to add real UTF-8 support for all string.* functions?
>> Such as ("é"):match("%g+") can work?
> 
> I believe you are asking for Unicode support, not UTF-8 support. (UTF-8
> has nothing to do with what characters mean, only with their binary
> representation.) We have no current plans for Unicode support in the
> standard libraries.

While character classes like "%g" would require the entire Unicode
tables, what about patterns like this:

    utf8.match("\u{e4}", "[\u{e4}-\u{e6}\u{f3}-\u{f5}]")

It wouldn't require Unicode tables but "just" UTF-8 support for the
matching functions.

Would that be possible without adding too much bloat? When I had to
match codepoint ranges before, I had to use multiple patterns to match
certain ranges of UTF-8 encodings.

Best regards,

David Kolf