Re: Plea for the support of unicode escape sequences

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Plea for the support of unicode escape sequences
From: Lorenzo Donati <lorenzodonatibz@...>
Date: Wed, 29 Jun 2011 07:18:38 +0200

On 29/06/2011 5.29, Tom N Harris wrote:

On 06/28/2011 04:24 PM, Lorenzo Donati wrote:

Unicode escape sequences are platform independent. They are useful for
the same reasons why ASCII codes are useful, at least for people working
with Unicode.


Technically, Lua doesn't even require ASCII,

I admit I cut the sentence short, but I didn't mean that Lua supportsASCII (the manual expressly states that string.byte returns non-portablecodes), but that, in general, if a language supports a specificcharacter set (ASCII was an example), it is useful to specify charactercodes in a program instead of characters. And if it is useful for agiven pre-unicode charset, it is useful for Unicode too (for the samereasons).


>
as the recent adventures

with lctype.c have shown. Unicode is platform specific because not all
platforms use the same encoding (UTF-8 vs UTF-16). And when Unicode
isn't being used at all this will just be dead-weight in the parser.

Well, I'm not an expert, but aside from the different encodings (UTF-8,16, 32 and endianness variants), Unicode is standardized. So if you aregoing to write a file in UTF-8, then the byte sequence for, say, asmiley, will be the seme on any computer on Earth that claims supportfor UTF-8. There is no risk of "codepage hell". Of course there are lotsof non- or partially conforming applications/systems, but that's anotherpoint.

How about supporting escape sequences greater than 255 when
sizeof(char)>1 ?

I don't understand exactly what you mean. Do you mean writing, forexample (assuming a new \GXXXX...multibyte esc sequence),\G10fa1binstead of \x10\xfa\x1b (here I assume translation to Lua 5.2 new escsequences)?

The power of specific unicode esc sequences is that Lua will make thetable lookup for you, so it will translate a code point to the specificbyte sequence for, say, UTF-8 encoding.


-- Lorenzo

References:
- Plea for the support of unicode escape sequences, Edgar Toernig
- Re: Plea for the support of unicode escape sequences, Ico
- Re: Plea for the support of unicode escape sequences, Lorenzo Donati
- Re: Plea for the support of unicode escape sequences, Tom N Harris

Prev by Date: Re: Plea for the support of unicode escape sequences
Next by Date: Re: Plea for the support of unicode escape sequences
Previous by thread: Re: Plea for the support of unicode escape sequences
Next by thread: Re: Plea for the support of unicode escape sequences
Index(es):
- Date
- Thread