[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Will Lua kernel use Unicode in the future?
- From: Chris Marrin <chris@...>
- Date: Thu, 29 Dec 2005 08:46:31 -0800
Klaus Ripke wrote:
On Thu, Dec 29, 2005 at 11:38:57AM +0000, Lisa Parratt wrote:
On 29 Dec 2005, at 11:06, Klaus Ripke wrote:
http://lua-users.org/wiki/LuaUnicode
A few observations, reading this page:
Lack of "\U+1234" style unicode character escapes - it strikes me
that the code to isolate such an escape, and then convert it to an 8
bit string would only take a few lines of code. Is there a good
theological reason why this isn't supported?
It would require the parser to settle for a given encoding
like UTF-8 or UCS2 or UTF16 or ...
OTOH a preprocessing step either at build time or
as a load hook could do this and much more.
Personally I prefer to have my editor produce UTF-8.
But it would not be hard for Lua to support this form, in addition to
the other currently supported escape sequences. Translating "U+1234"
into a UTF8 sequence is a very small snippet of code. This does not
technically lock Lua into the UTF8 encoding. It merely translates this
escape sequence to a series of characters that happen to be UTF8.
Splitting hairs???
...
Unicode string comparison and normalisation issues - I might be being
forgetful, but I was under the impression C99 added Unicode compliant
wide character comparison functions - perhaps these should be used if
present?
You might not want to use wide chars at all
(there are pros and cons compared to using UTF-8 internally).
For UTF-8 good old strcoll/strxfrm (hence Lua) does the job,
with appropriate locale settings.
Anyway many consider the "locale" mechanism broken,
and a full implementation of the unicode collation algorithm
has to be quite expensive.
But this is an area of the spec that needs to be tightened up. Currently
it says "strings are compared in the usual way". What does that mean? As
it turns out, Lua uses strcoll(), which is good. That means it will do a
string comparison that is accurate according to the currently set
encoding. If you set the encoding to UTF8, then you will have full UTF8
support. Of course you would need the \U escape as well!
I think the biggest problem here is that setting the locale can't be
done from Lua, AFAIK. It would be nice to have a platform independent
way to do that.
--
chris marrin ,""$,
chris@marrin.com b` $ ,,.
mP b' , 1$'
,.` ,b` ,` :$$'
,|` mP ,` ,mm
,b" b" ,` ,mm m$$ ,m ,`P$$
m$` ,b` .` ,mm ,'|$P ,|"1$` ,b$P ,` :$1
b$` ,$: :,`` |$$ ,` $$` ,|` ,$$,,`"$$ .` :$|
b$| _m$`,:` :$1 ,` ,$Pm|` ` :$$,..;"' |$:
P$b, _;b$$b$1" |$$ ,` ,$$" ``' $$
```"```'" `"` `""` ""` ,P`
"As a general rule,don't solve puzzles that open portals to Hell"'