[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: Plea for the support of unicode escape sequences
- From: Lorenzo Donati <lorenzodonatibz@...>
- Date: Tue, 28 Jun 2011 22:36:10 +0200
On 28/06/2011 21.08, Edgar Toernig wrote:
As it's the last chance for probably 5 or 6 years to ask for it:
Could the next version have support for Unicode escape sequences?
(like "A smiley: \u263a, an en-dash: \u2013, an ellipsis: \u2026")
Unicode is in wide use now but encoding characters using the \x hex
escapes is annoying. Even now most extension libraries are at least
UTF-8 transparent but there's no sane way to enter non-trivial
unicode characters. I.e. the above string encoded by hand would
become "A smiley: \xe2\x98\xba, an en-dash: \xe2\x80\x93, an
ellipsis: \xe2\x80\xa6" and as unicode-tables usually don't contain
the UTF-8 encoded form you have to do the conversion manually.
For stock Lua, these Unicode escape sequences should generate UTF-8.
Yes. I'd find those useful sometimes, but wouldn't a simple library in
pure Lua be equally suitable (beside maybe performance)?
I'm not really an expert of UTF-8, but if you say the needed tables are
small, wouldn't suffice a simple library with, for example, a function like:
utf8lib.encode [[\u263a\u2026]] --> [[\xe2\x98\xba\xe2\x80\xa6]]
Modified version using wchars may use an appropriate encoding, UTF-16
or UTF-32). I don't care whether the common \u+4hexdigits and
\U+8hexdigits or a variable \u+1to6or8hexdigits sequence is implemented
(but don't make it decimal, unicode tables usually use hex numbering).
I know that Lua's authors try to avoid bloat, but these additional
176 bytes (that's what an implementation of the \u4x/\U8x variant on
x86-32 costs) are IMHO very well spent.