lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Jens Alfke wrote:
> I agree, mostly; the one area I've run into problems has been with  
> regexp/pattern libraries. Any pattern that relies on "alphanumeric  
> characters" or "word boundaries" assumes a fair bit of Unicode  
> knowledge behind the scenes.
> [...]
> but there should be some kind  
> of standard extension for Unicode strings, hopefully one that cleanly  
> extends the built-in string objects using something like ICU.

Well, did you even take a look at Klaus' library?

$ lua -l unicode -e 'print(unicode.utf8.find("xäy", ".(%w)."))'
1       4       ä

You can see that I typed the umlaut in a UTF-8 locale because the
second number is 4 (and not 3 like in iso-8859-1 or -15).

BTW to Klaus: 5.1-beta renamed MAX_CAPTURES to LUA_MAXCAPTURES.