lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On 28/06/2011 21.08, Edgar Toernig wrote:
As it's the last chance for probably 5 or 6 years to ask for it:

Could the next version have support for Unicode escape sequences?
(like "A smiley: \u263a, an en-dash: \u2013, an ellipsis: \u2026")

Unicode is in wide use now but encoding characters using the \x hex
escapes is annoying.  Even now most extension libraries are at least
UTF-8 transparent but there's no sane way to enter non-trivial
unicode characters.  I.e. the above string encoded by hand would
become "A smiley: \xe2\x98\xba, an en-dash: \xe2\x80\x93, an
ellipsis: \xe2\x80\xa6" and as unicode-tables usually don't contain
the UTF-8 encoded form you have to do the conversion manually.

For stock Lua, these Unicode escape sequences should generate UTF-8.

Yes. I'd find those useful sometimes, but wouldn't a simple library in pure Lua be equally suitable (beside maybe performance)?

I'm not really an expert of UTF-8, but if you say the needed tables are small, wouldn't suffice a simple library with, for example, a function like:

utf8lib.encode [[\u263a\u2026]] --> [[\xe2\x98\xba\xe2\x80\xa6]]


Modified version using wchars may use an appropriate encoding, UTF-16
or UTF-32).  I don't care whether the common \u+4hexdigits and
\U+8hexdigits or a variable \u+1to6or8hexdigits sequence is implemented
(but don't make it decimal, unicode tables usually use hex numbering).

I know that Lua's authors try to avoid bloat, but these additional
176 bytes (that's what an implementation of the \u4x/\U8x variant on
x86-32 costs) are IMHO very well spent.





Ciao, ET.



Cheers,
-- Lorenzo