[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: Unicode?
- From: "chris.danx" <chris.danx@...>
- Date: Wed, 11 Jun 2003 19:21:11 +0100
Mark Hamburg wrote:
I haven't pounded on it extensively, but I've wired my simple Lua
environment (built in Cocoa on MacOS X) to work with UTF8 encoded strings
for input and output. I expect this to be fine so long as I:
* Don't want to disassemble strings into characters
I definitely want to do that. I need to compare parts of strings to
other strings, pull out bits of strings and stick strings together.
* Use regular expressions that use things other than low-ASCII for matches
* Perform comparisons on strings other than for equality
And possibly this aswell.
What this relies on is that:
* Lua fully supports essentially any 8-bit character set but really only
cares about those in the 7-bit ASCII set from a parsing standpoint
* UTF-8 does all of its encoding using combinations of high 8-bit values --
i.e., the bytes of a multibyte character can never be mistaken for ASCII
But two identical utf-8 characters can have different encoding, right?
So two strings can contain the same characters but different byte
sequences and hence by not be equal.
I don't need full utf-8 support, like comparisons for every character
and string but I do need some level of support that allows the use of
utf-8, even if the underlying system can't fully support it. Maybe that
didn't make sense? What I mean is that it allows strings to be in utf-8
and uses functions which support utf-8, even if only partially. If you
need more than the functions used currenltly implement, you just
implement it in the function, recompile and test and you don't need to
modify anything else.