lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Hi,

Chris Marrin wrote:
> I see that you can say "en-us.utf-8", but does it 
> REQUIRE a language code? And is this cross-platform?

Yes, you need a language code. But it's ignored except for things
like the monetary symbol or collation order. IMHO it's best to
only set "ctype" (LC_CTYPE environment variable) to avoid some
other NLS pitfalls (e.g. the dot vs. comma problem with numbers).

The Unicode FAQ for Unix/Linux explains this and many more things
that have been discussed in this thread:

  http://www.cl.cam.ac.uk/~mgk25/unicode.html

And here are some opinions on the UTF-8 vs. UTF-16/UTF-32/wchar_t
debate:

  http://www.tbray.org/ongoing/When/200x/2003/04/26/UTF
  http://www.tbray.org/ongoing/When/200x/2003/04/30/JavaStrings

My personal opinion: There is no point in using anything else than
UTF-8 (in memory and on disk). All other variants create more
problems than they solve. 'Characters' is a concept of the past.

Most apps can just treat strings as opaque byte streams. Lua is
very well suited for this. It can be augmented with Klaus' library
for the more demanding things: http://luaforge.net/projects/sln/

Only very few apps (e.g. word processors, GUI libraries) need to
take care of combining characters, glyphs, writing directions and
so on. And not surprisingly none of these apps rely on the
awful libc NLS support. So let's better forget about wchar_t.

Bye,
     Mike