lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


The first thing is: %w ("word" character) is defined by the C standard library, which in turn uses the current locale to decide what a character is. Your locale is probably the "C" locale, in which word characters have no accents. You might be able to get é recognized by changing to the pt-BR locale.

The other things is: if your text editor is storing files in UTF-8, the é might actually be occupying two bytes. In this case, %w won't work regardless of locale, because %w only matches one byte, since Lua strings are essentially byte-vectors. (Or dare I say immutable byte tuples :).

Either way, figuring out what a "word" character is can be quite challenging. The C standard library call on which %w is based in used to find letters which can be "normally parts of an identifier", that is an identifier in a programming language. Human identifiers (that is, names) can be quite a bit more complex. For example, O'Reilly and Dell'omo are pretty common surnames in some places; the ' character would also throw off %w.

R.

On 4-Nov-05, at 7:56 AM, Walter Cruz wrote:

Hi all. I'm using lua package from Debian, (I'm using unstable)

 But there's something strange. That little script:

 _______
 x = "Walter é"
 print(x)
 t={}
 for word in string.gfind(x, "%w+") do
     table.insert(t,word)
 end

 table.foreach(t,print)
 ___________

 returns :
 _____
 Walter é
 1       Walter
 _______

The accented char is losted. I have downloaded lua 5.1 beta and compiled it, but the behaviour is the same.

 I don't know what is causing that :(

 []'
 - Walter