lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Joshua Jensen wrote:
[...]
It does the equivalent of wcscmp(), only it doesn't rely on the C runtime to achieve this. That's because on some non-Visual C++ compilers, sizeof(wchar_t) != 2. sizeof(lua_WChar) is always 2.

Yes, in the Unix world it's always 4 (wchar_t is an int).

It's easier in the console world --- you've got complete control over all the text on your system, so you can ensure you're not using any weird stuff like RTL, surrogates, unsupported combining characters, etc.

[...]
I would consider ditching the LuaPlus wide character support if there was a small library that supported UTF-8 and allowed easy embedding of UTF-8 string types in Lua source files.

Well, UTF-8 in Lua source files already Just Works. (They're treated by Lua as Bags of Bytes.) As far as libraries go, I wrote some very simple UTF-8 parsing code for WordGrinder:

http://wordgrinder.svn.sourceforge.net/viewvc/wordgrinder/wordgrinder/src/c/utils.c?view=markup

This will let you read and write raw code points from/to a string in a relatively simple manner.

Thinking about this, a while back I did actually find that Unicode has real rules for splitting up a UTF-8 string into 'characters', each of which is an arbitrary-sized string representing a single drawable thing (I forget the exact term --- grapheme clusters?). So theoretically it ought to be possible to *truly* do random-access on a string. Maybe I should revisit this at some point.

--
┌─── dg@cowlark.com ───── http://www.cowlark.com ─────
│
│ ⍎'⎕',∊N⍴⊂S←'←⎕←(3=T)⋎M⋏2=T←⊃+/(V⌽"⊂M),(V⊝"M),(V,⌽V)⌽"(V,V←1⎺1)⊝"⊂M)'
│ --- Conway's Game Of Life, in one line of APL