lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On 29/06/2011 5.29, Tom N Harris wrote:
On 06/28/2011 04:24 PM, Lorenzo Donati wrote:
Unicode escape sequences are platform independent. They are useful for
the same reasons why ASCII codes are useful, at least for people working
with Unicode.


Technically, Lua doesn't even require ASCII,

I admit I cut the sentence short, but I didn't mean that Lua supports ASCII (the manual expressly states that string.byte returns non-portable codes), but that, in general, if a language supports a specific character set (ASCII was an example), it is useful to specify character codes in a program instead of characters. And if it is useful for a given pre-unicode charset, it is useful for Unicode too (for the same reasons).

>
as the recent adventures
with lctype.c have shown. Unicode is platform specific because not all
platforms use the same encoding (UTF-8 vs UTF-16). And when Unicode
isn't being used at all this will just be dead-weight in the parser.


Well, I'm not an expert, but aside from the different encodings (UTF-8, 16, 32 and endianness variants), Unicode is standardized. So if you are going to write a file in UTF-8, then the byte sequence for, say, a smiley, will be the seme on any computer on Earth that claims support for UTF-8. There is no risk of "codepage hell". Of course there are lots of non- or partially conforming applications/systems, but that's another point.


How about supporting escape sequences greater than 255 when
sizeof(char)>1 ?


I don't understand exactly what you mean. Do you mean writing, for example (assuming a new \GXXXX...multibyte esc sequence),\G10fa1b instead of \x10\xfa\x1b (here I assume translation to Lua 5.2 new esc sequences)?

The power of specific unicode esc sequences is that Lua will make the table lookup for you, so it will translate a code point to the specific byte sequence for, say, UTF-8 encoding.

-- Lorenzo