[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: newbie question - strings and arrays
- From: David Given <dg@...>
- Date: Thu, 13 Oct 2005 17:56:18 +0100
On Thursday 13 October 2005 16:48, Jamie Webb wrote:
[...]
> Lua's strings are similar to Java's, except that Java uses unicode
> characters rather than bytes, and doesn't do interning until told to.
> And it has the StringBuffer class for those occasions when you want a
> string to be mutable.
Java uses 16-bit quantities for characters, because that's what Unicode was
when Java was designed. When Unicode started allocating characters above
65535 Java was, basically, completely screwed --- it was impossible to change
the definition of char to be 32-bits wide because it was so crucial to the
language. (People were using char as an unsigned 16-bit value and relying on
its properties.)
As a result, Java strings are not simple arrays of characters. Instead,
they're UTF16-encoded strings; Unicode expressed as a stream of 16-bit
values. So they've got all the overhead of using uncompressed Unicode, and
string[i] *still* doesn't return the ith character! It's a horrific mess, and
I confidently predict that it will cause them grief in the not-so-near
future.
Lua avoids all this by defining strings as a sequence of bytes. Complex
encodings are, therefore, entirely an application problem.
--
+- David Given --McQ-+ "For is it not written, wheresoever two or three
| dg@cowlark.com | are gathered together, yea they will perform the
| (dg@tao-group.com) | Parrot Sketch?" --- _Not The 9 o'Clock News_
+- www.cowlark.com --+
Attachment:
pgpZT7Z8j_HYa.pgp
Description: PGP signature