lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


David Given wrote:
On Thursday 13 October 2005 16:48, Jamie Webb wrote:
[...]

Lua's strings are similar to Java's, except that Java uses unicode
characters rather than bytes, and doesn't do interning until told to.
And it has the StringBuffer class for those occasions when you want a
string to be mutable.


Java uses 16-bit quantities for characters, because that's what Unicode was when Java was designed. When Unicode started allocating characters above 65535 Java was, basically, completely screwed --- it was impossible to change the definition of char to be 32-bits wide because it was so crucial to the language. (People were using char as an unsigned 16-bit value and relying on its properties.)

As a result, Java strings are not simple arrays of characters. Instead, they're UTF16-encoded strings; Unicode expressed as a stream of 16-bit values. So they've got all the overhead of using uncompressed Unicode, and string[i] *still* doesn't return the ith character! It's a horrific mess, and I confidently predict that it will cause them grief in the not-so-near future.

Lua avoids all this by defining strings as a sequence of bytes. Complex encodings are, therefore, entirely an application problem.


And, of course to make matters worse, win32 also uses 16 bit values for its wchar_t type and all its wide character processing functions. Linux uses 32 bit values for its wide char support, but I think it can be made to use 16 bit characters as well.

The good news is that character sequences above 65535 are rare. They are mostly for Chinese characters that are rarely used. So most people just ignore it and still claim to be unicode compatible.

--
chris marrin                ,""$,
chris@marrin.com          b`    $                             ,,.
                        mP     b'                            , 1$'
        ,.`           ,b`    ,`                              :$$'
     ,|`             mP    ,`                                       ,mm
   ,b"              b"   ,`            ,mm      m$$    ,m         ,`P$$
  m$`             ,b`  .` ,mm        ,'|$P   ,|"1$`  ,b$P       ,`  :$1
 b$`             ,$: :,`` |$$      ,`   $$` ,|` ,$$,,`"$$     .`    :$|
b$|            _m$`,:`    :$1   ,`     ,$Pm|`    `    :$$,..;"'     |$:
P$b,      _;b$$b$1"       |$$ ,`      ,$$"             ``'          $$
 ```"```'"    `"`         `""`        ""`                          ,P`
"As a general rule,don't solve puzzles that open portals to Hell"'