[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: The World According to Lua: How To?
- From: PA <petite.abeille@...>
- Date: Fri, 18 Feb 2005 19:14:59 +0100
On Feb 18, 2005, at 16:16, David Given wrote:
Basically, Lua doesn't know about encodings. Lua strings are streams
of bytes,
and it assumes that one character is one byte. Collation is done using
the
byte value.
Ok.
This means that you can put any kind of data in a string --- but it's
your
responsibility to manipulate it correctly and do any conversion.
Ah... this is the major catch.
For example, if you're storing UTF8 in a Lua string (which is the
recommended
way of doing Unicode in Lua), then you can't assume that you can read
character n by looking at byte n. *However*, string substitutions and
pattern
matching will still work in a limited way. The regular expression
".*fnord.*"
will still match any string containing 'fnord', regardless of whether
there
are multibyte characters in the string; likewise, the pattern ".*©.*"
will
work; but "©*" won't work, because the * will bind to the last byte of
the
multibyte character. The collation functions will still work on
single-byte
characters but will sort multibyte characters oddly. And so on.
So basically, UTF-8 renders most Lua core functionality useless as soon
as one venture beyond US-ASCII, broadly speaking?
If you're writing a web server, then your best bet is to emit UTF8,
I can do that.
and avoid
doing any string slicing;
I need to do that. A major feature of the app is search.
if you write your Lua scripts in UTF8, then you can
trivially include UTF8 sequences in constant strings:
local s = "fóö"
Since HTTP can be driven entirely with US-ASCII, then this probably
won't
cause you any problems.
Thanks for the explanations :)
Cheers
--
PA, Onnay Equitursay
http://alt.textdrive.com/