[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: Testing LUA: verify the correctness of a UTF16 LUA port
- From: David Given <dg@...>
- Date: Thu, 15 Oct 2009 03:16:56 +0100
uri cohen wrote:
My question is on how can I verify my port works? Other than toy scripts
I created, I'm looking for a comprehensive set of tests I can run in
order to verify all important language feature were not broken...
Unicode is harder than it looks... one reason Lua doesn't really use it
is that once you start dealing with Unicode you start finding places
where you get conflicting requirements.
For example, é can be represented as both U+301 U+0065, or as U+00E9. Do
these compare equal? They are technically the same thing.
What about sorting order? Does Ё (U+0401) sort before, after, or equal
to Ë (U+00CB)? For that matter, what about ഐ (U+0D10) and ᚔ (U+1694)?
What about E (U+0045), Ε (U+0395), Е (U+0415), ⋿ (U+22FF), ⴹ (U+2D39),
Ｅ (U+FF25), 𝐄 (U+1D404), 𝐸 (U+1D438), 𝑬 (U+1D46C), 𝔼 (U+1D53C), 𝖤
(U+1D5A4), 𝗘 (U+1D5D8), 𝘌 (U+1D60C), 𝙀 (U+1D640), 𝙴 (U+1D674), 𝚬
(U+1D6AC), �𝛦 (U+1D6E6), 𝜠 (U+1D720), or 𝝚 (U+1D75A)?
Do you mean UTF-16 or UCS-2? UCS-2 can't handle some of the really
freaky Unicode characters like 𝌆 (U+1D306) or 🀎� (U_1F00E) --- I don't
even have the font to display that last one!
And, most importantly of all, can you still use Lua strings to represent
arbitrary binary data, or is the data forced into well-formed UTF-16?
One reason people tend to use UTF-8 in Lua is not that it solves all
these problems, but that it cleanly divides the problems into soluble
ones and non-soluble ones! And it turns out that most people don't care
about the non-soluble ones. Unfortunately, once you start trying to
*natively* support Unicode, you suddenly find yourself having to care
about these things...
┌─── ｄｇ＠ｃｏｗｌａｒｋ．ｃｏｍ ───── http://www.cowlark.com ─────
│ --- Conway's Game Of Life, in one line of APL