[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: The World According to Lua: How To?
- From: PA <petite.abeille@...>
- Date: Fri, 18 Feb 2005 19:14:59 +0100
On Feb 18, 2005, at 16:16, David Given wrote:
Basically, Lua doesn't know about encodings. Lua strings are streams
and it assumes that one character is one byte. Collation is done using
This means that you can put any kind of data in a string --- but it's
responsibility to manipulate it correctly and do any conversion.
Ah... this is the major catch.
For example, if you're storing UTF8 in a Lua string (which is the
way of doing Unicode in Lua), then you can't assume that you can read
character n by looking at byte n. *However*, string substitutions and
matching will still work in a limited way. The regular expression
will still match any string containing 'fnord', regardless of whether
are multibyte characters in the string; likewise, the pattern ".*©.*"
work; but "©*" won't work, because the * will bind to the last byte of
multibyte character. The collation functions will still work on
characters but will sort multibyte characters oddly. And so on.
So basically, UTF-8 renders most Lua core functionality useless as soon
as one venture beyond US-ASCII, broadly speaking?
If you're writing a web server, then your best bet is to emit UTF8,
I can do that.
doing any string slicing;
I need to do that. A major feature of the app is search.
if you write your Lua scripts in UTF8, then you can
trivially include UTF8 sequences in constant strings:
local s = "fóö"
Since HTTP can be driven entirely with US-ASCII, then this probably
cause you any problems.
Thanks for the explanations :)
PA, Onnay Equitursay