lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Thursday 13 October 2005 16:48, Jamie Webb wrote:
[...]
> Lua's strings are similar to Java's, except that Java uses unicode
> characters rather than bytes, and doesn't do interning until told to.
> And it has the StringBuffer class for those occasions when you want a
> string to be mutable.

Java uses 16-bit quantities for characters, because that's what Unicode was 
when Java was designed. When Unicode started allocating characters above 
65535 Java was, basically, completely screwed --- it was impossible to change 
the definition of char to be 32-bits wide because it was so crucial to the 
language. (People were using char as an unsigned 16-bit value and relying on 
its properties.)

As a result, Java strings are not simple arrays of characters. Instead, 
they're UTF16-encoded strings; Unicode expressed as a stream of 16-bit 
values. So they've got all the overhead of using uncompressed Unicode, and 
string[i] *still* doesn't return the ith character! It's a horrific mess, and 
I confidently predict that it will cause them grief in the not-so-near 
future.

Lua avoids all this by defining strings as a sequence of bytes. Complex 
encodings are, therefore, entirely an application problem.

-- 
+- David Given --McQ-+ "For is it not written, wheresoever two or three
|  dg@cowlark.com    | are gathered together, yea they will perform the
| (dg@tao-group.com) | Parrot Sketch?" --- _Not The 9 o'Clock News_
+- www.cowlark.com --+ 

Attachment: pgpMt_shEPJb0.pgp
Description: PGP signature