[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: strange behaviour with accented chars on debian unstable
- From: Rici Lake <lua@...>
- Date: Fri, 4 Nov 2005 08:45:31 -0500
The first thing is: %w ("word" character) is defined by the C standard
library, which in turn uses the current locale to decide what a
character is. Your locale is probably the "C" locale, in which word
characters have no accents. You might be able to get é recognized by
changing to the pt-BR locale.
The other things is: if your text editor is storing files in UTF-8, the
é might actually be occupying two bytes. In this case, %w won't work
regardless of locale, because %w only matches one byte, since Lua
strings are essentially byte-vectors. (Or dare I say immutable byte
Either way, figuring out what a "word" character is can be quite
challenging. The C standard library call on which %w is based in used
to find letters which can be "normally parts of an identifier", that is
an identifier in a programming language. Human identifiers (that is,
names) can be quite a bit more complex. For example, O'Reilly and
Dell'omo are pretty common surnames in some places; the ' character
would also throw off %w.
On 4-Nov-05, at 7:56 AM, Walter Cruz wrote:
Hi all. I'm using lua package from Debian, (I'm using unstable)
But there's something strange. That little script:
x = "Walter é"
for word in string.gfind(x, "%w+") do
The accented char is losted. I have downloaded lua 5.1 beta and
compiled it, but the behaviour is the same.
I don't know what is causing that :(