[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: The World According to Lua: How To?
- From: David Given <dg@...>
- Date: Fri, 18 Feb 2005 15:16:53 +0000
On Friday 18 February 2005 14:35, PA wrote:
> On Feb 18, 2005, at 12:36, Glenn Maynard wrote:
> > A tip: using a real name on technical lists will tend to get you
> > a better response.
> Right... this is email... there is no such a thing as a "read name" :)
I beg to differ; everyone has a real name, and it's usually considered good
etiquette to use them on a technical list (as opposed to a social list)...
not an important point, however.
> Ok. So Lua's encoding reflects the OS encoding?
Basically, Lua doesn't know about encodings. Lua strings are streams of bytes,
and it assumes that one character is one byte. Collation is done using the
This means that you can put any kind of data in a string --- but it's your
responsibility to manipulate it correctly and do any conversion.
For example, if you're storing UTF8 in a Lua string (which is the recommended
way of doing Unicode in Lua), then you can't assume that you can read
character n by looking at byte n. *However*, string substitutions and pattern
matching will still work in a limited way. The regular expression ".*fnord.*"
will still match any string containing 'fnord', regardless of whether there
are multibyte characters in the string; likewise, the pattern ".*©.*" will
work; but "©*" won't work, because the * will bind to the last byte of the
multibyte character. The collation functions will still work on single-byte
characters but will sort multibyte characters oddly. And so on.
If you use fixed-length encodings such as UCS2 or UCS4 then of course the
pattern matching functions become useless to you.
Anything from the ISO8859 family is trivial, of course.
If you're writing a web server, then your best bet is to emit UTF8, and avoid
doing any string slicing; if you write your Lua scripts in UTF8, then you can
trivially include UTF8 sequences in constant strings:
local s = "fóö"
Since HTTP can be driven entirely with US-ASCII, then this probably won't
cause you any problems.
+- David Given --McQ-+ "There is // One art // No more // No less // To
| firstname.lastname@example.org | do // All things // With art // Lessness." --- Piet
| (email@example.com) | Hein
+- www.cowlark.com --+