lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


uri cohen wrote:
[...]
My question is on how can I verify my port works? Other than toy scripts I created, I'm looking for a comprehensive set of tests I can run in order to verify all important language feature were not broken...
Unicode is harder than it looks... one reason Lua doesn't really use it 
is that once you start dealing with Unicode you start finding places 
where you get conflicting requirements.
For example, é can be represented as both U+301 U+0065, or as U+00E9. Do 
these compare equal? They are technically the same thing.
What about sorting order? Does Ё (U+0401) sort before, after, or equal 
to Ë (U+00CB)? For that matter, what about ഐ (U+0D10) and ᚔ (U+1694)? 
What about E (U+0045), Ε (U+0395), Е (U+0415), ⋿ (U+22FF), ⴹ (U+2D39), 
E (U+FF25), 𝐄 (U+1D404), 𝐸 (U+1D438), 𝑬 (U+1D46C), 𝔼 (U+1D53C), 𝖤 
(U+1D5A4), 𝗘 (U+1D5D8), 𝘌 (U+1D60C), 𝙀 (U+1D640), 𝙴 (U+1D674), 𝚬 
(U+1D6AC), �𝛦 (U+1D6E6), 𝜠 (U+1D720), or 𝝚 (U+1D75A)?
Do you mean UTF-16 or UCS-2? UCS-2 can't handle some of the really 
freaky Unicode characters like 𝌆 (U+1D306) or 🀎� (U_1F00E) --- I don't 
even have the font to display that last one!
And, most importantly of all, can you still use Lua strings to represent 
arbitrary binary data, or is the data forced into well-formed UTF-16?
One reason people tend to use UTF-8 in Lua is not that it solves all 
these problems, but that it cleanly divides the problems into soluble 
ones and non-soluble ones! And it turns out that most people don't care 
about the non-soluble ones. Unfortunately, once you start trying to 
*natively* support Unicode, you suddenly find yourself having to care 
about these things...
(adjusts signature)

--
┌─── dg@cowlark.com ───── http://www.cowlark.com ─────
│
│ ⍎'⎕',∊N⍴⊂S←'←⎕←(3=T)⋎M⋏2=T←⊃+/(V⌽"⊂M),(V⊝"M),(V,⌽V)⌽"(V,V←1⎺1)⊝"⊂M)'
│ --- Conway's Game Of Life, in one line of APL