[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: newbie question - strings and arrays
- From: Chris Marrin <chris@...>
- Date: Thu, 13 Oct 2005 11:31:24 -0700
David Given wrote:
On Thursday 13 October 2005 16:48, Jamie Webb wrote:
Lua's strings are similar to Java's, except that Java uses unicode
characters rather than bytes, and doesn't do interning until told to.
And it has the StringBuffer class for those occasions when you want a
string to be mutable.
Java uses 16-bit quantities for characters, because that's what Unicode was
when Java was designed. When Unicode started allocating characters above
65535 Java was, basically, completely screwed --- it was impossible to change
the definition of char to be 32-bits wide because it was so crucial to the
language. (People were using char as an unsigned 16-bit value and relying on
As a result, Java strings are not simple arrays of characters. Instead,
they're UTF16-encoded strings; Unicode expressed as a stream of 16-bit
values. So they've got all the overhead of using uncompressed Unicode, and
string[i] *still* doesn't return the ith character! It's a horrific mess, and
I confidently predict that it will cause them grief in the not-so-near
Lua avoids all this by defining strings as a sequence of bytes. Complex
encodings are, therefore, entirely an application problem.
And, of course to make matters worse, win32 also uses 16 bit values for
its wchar_t type and all its wide character processing functions. Linux
uses 32 bit values for its wide char support, but I think it can be made
to use 16 bit characters as well.
The good news is that character sequences above 65535 are rare. They are
mostly for Chinese characters that are rarely used. So most people just
ignore it and still claim to be unicode compatible.
chris marrin ,""$,
firstname.lastname@example.org b` $ ,,.
mP b' , 1$'
,.` ,b` ,` :$$'
,|` mP ,` ,mm
,b" b" ,` ,mm m$$ ,m ,`P$$
m$` ,b` .` ,mm ,'|$P ,|"1$` ,b$P ,` :$1
b$` ,$: :,`` |$$ ,` $$` ,|` ,$$,,`"$$ .` :$|
b$| _m$`,:` :$1 ,` ,$Pm|` ` :$$,..;"' |$:
P$b, _;b$$b$1" |$$ ,` ,$$" ``' $$
```"```'" `"` `""` ""` ,P`
"As a general rule,don't solve puzzles that open portals to Hell"'