lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Klaus Ripke wrote:
On Thu, Dec 29, 2005 at 11:38:57AM +0000, Lisa Parratt wrote:

On 29 Dec 2005, at 11:06, Klaus Ripke wrote:

http://lua-users.org/wiki/LuaUnicode

A few observations, reading this page:

Lack of "\U+1234" style unicode character escapes - it strikes me that the code to isolate such an escape, and then convert it to an 8 bit string would only take a few lines of code. Is there a good theological reason why this isn't supported?

It would require the parser to settle for a given encoding
like UTF-8 or UCS2 or UTF16 or ...
OTOH a preprocessing step either at build time or
as a load hook could do this and much more.
Personally I prefer to have my editor produce UTF-8.

But it would not be hard for Lua to support this form, in addition to the other currently supported escape sequences. Translating "U+1234" into a UTF8 sequence is a very small snippet of code. This does not technically lock Lua into the UTF8 encoding. It merely translates this escape sequence to a series of characters that happen to be UTF8. Splitting hairs???

...
Unicode string comparison and normalisation issues - I might be being forgetful, but I was under the impression C99 added Unicode compliant wide character comparison functions - perhaps these should be used if present?

You might not want to use wide chars at all
(there are pros and cons compared to using UTF-8 internally).
For UTF-8 good old strcoll/strxfrm (hence Lua) does the job,
with appropriate locale settings.
Anyway many consider the "locale" mechanism broken,
and a full implementation of the unicode collation algorithm
has to be quite expensive.

But this is an area of the spec that needs to be tightened up. Currently it says "strings are compared in the usual way". What does that mean? As it turns out, Lua uses strcoll(), which is good. That means it will do a string comparison that is accurate according to the currently set encoding. If you set the encoding to UTF8, then you will have full UTF8 support. Of course you would need the \U escape as well!

I think the biggest problem here is that setting the locale can't be done from Lua, AFAIK. It would be nice to have a platform independent way to do that.

--
chris marrin                ,""$,
chris@marrin.com          b`    $                             ,,.
                        mP     b'                            , 1$'
        ,.`           ,b`    ,`                              :$$'
     ,|`             mP    ,`                                       ,mm
   ,b"              b"   ,`            ,mm      m$$    ,m         ,`P$$
  m$`             ,b`  .` ,mm        ,'|$P   ,|"1$`  ,b$P       ,`  :$1
 b$`             ,$: :,`` |$$      ,`   $$` ,|` ,$$,,`"$$     .`    :$|
b$|            _m$`,:`    :$1   ,`     ,$Pm|`    `    :$$,..;"'     |$:
P$b,      _;b$$b$1"       |$$ ,`      ,$$"             ``'          $$
 ```"```'"    `"`         `""`        ""`                          ,P`
"As a general rule,don't solve puzzles that open portals to Hell"'