lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Roberto Ierusalimschy wrote:
> The problem here is that pattern modifiers (`*', `+', etc.) in Lua work
> only over a single char. If someone writes "ã*", she wants the "whole"
> ã to repeat (and not only the last byte in the representation of ã), so
> pattern matching must be `UTF-8 aware' (and `UTF-8-able'...)

Oops! Thanks for pointing this out!

This problem related to the fact that currently, in Lua, 
a-umlaut is a "latin" character that lies outside the ASCII 7 bit 
range. Such characters are not directly compatible with UTF-8, 
and need to be converted to a 2-byte representation. 

But, and please correct me if I am wrong, but don't lua regexps 
support multi-byte patterns? Can't you do a (abc)+ to match 
"abcabcabc" ? If that is (made) possible, the modification to gsub and
substr should be relatively simple.

"No one knows true heroes, for they speak not of their greatness." -- 
Daniel Remar.
Björn De Meyer