lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]



On Feb 18, 2005, at 11:30, Klaus Ripke wrote:

It means that it can pass around data in any charset you like.
No less and no more (than "passing around").

Hmmm... right. Not that helpful, isn't it?

Lua uses the locale sensitive single-byte C API.
So you find out in the C89/C90 standard documents.
(e.g. http://danpop.home.cern.ch/danpop/ansi.c)

Ok.

There is no builtin way to usefully deal with multibyte charsets,
however, extensions like my UTF-8 stuff can use the builtin string type
without any trouble.

In other words, Lua itself is not Unicode safe one way or another?

There also is absolutely no recoding support builtin,
you will get just whatever character coding came in from your files
or accross the wire. Yet, a recoding extension is scheduled,
and wrapping it around standard files and sockets to make them
appear magically recoded is no big deal.

This is futureware, right? What about today?

For a truly i18n app it's probably easiest to always use UTF-8
internally.

This is what I would like to do, yes. How do I achieve that today with the stock Lua distribution?

How do I convert things back and forth to it?
wait for the encoding extension

Hmmm... where is that fabled "extension"? Got a link?

When I do aFile:read( "*all" ), what do I get back as far as character
set goes?
whatever is in there

Not very helpful, isn't it?

My OS do have a default character set encoding, but how do I
know about it?
It is ISO-8859-1 (Latin-1).

Always? ISO-8859-1? This is useless for three fourth of world.

Bytes don't have no encoding.

What about sequences of bytes?

but... how precisely does setlocale relates to character
set encoding, if at all?
ctypes and collation

Ok. How? I would like my application to always deal with UTF-8 internally. And convert everything and anything coming its way to it. How?

How do I tell Lua that everything I want to
deal with is UTF-8 encoded and that is it?!?!
you don't - Lua doesn't care. Use the extension.

You mean your recently mentioned UTF-8 library? This is the extension you are talking about? Is Lua itself going to ever support Unicode directly one way or another?

Then there is the issue of setlocale scope... does it impact the entire
VM?
yep

Bummer :/


How do I handle several locales concurrently?
you don't
well, you may switch back and forth,
but don't try to do this in a multithreaded app.

C API locale support is just braindead.
It was meant to enable NLS in existing applications which had been
written without being aware of these issues.
Since NLS is not that simple, it didn't work out.

Ok. Any alternatives? How do people deal with locales then? Just pretend they are not there?


For instance, lets
assume that my application display its data according to HTTP's
Accept-Language header. One request is in de_DE, the next one in fr_FR
and so on, while the application default language is en_US. How does
all this fit together?
It doesn't - you just don't care.

Hmmm... what if I do care?

First, all of these are using Latin-1 anyways.

Latin-1 doesn't work for my potential Japanese users.

Pick one of "ja" and "oui" and "yerpo".
Second, set the document's content type to "text/html; charset=ISO-8859-1"
in your webserver config and better also in the documents header.

My webserver is my application. There is no Deus ex Machina.

Use other charsets, including UTF-8, accordingly.
If you happen to have russian text in KOI on your server,
you ought to know that.

Sorry, you totally lost me here.

http://www.i18nguy.com/

I'm not interested in i18n in general. Only specifics on how to deal with it in Lua.

In any case, what seems to emerge from all this mess is that Lua is simply not ready for prime time as far as i18n goes. Is that a fair assessment or did I miss something obvious as usual?

Thanks.

Cheers

--
PA, Onnay Equitursay
http://alt.textdrive.com/