lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]




On 30/08/15 09:53 AM, Dirk Laurie wrote:
2015-08-30 14:30 GMT+02:00 Soni L. <fakedme@gmail.com>:
LuaJIT recently added Lua 5.3's "\u{}" escapes. It's also more strict about
Unicode errors than Lua 5.3[1].

For example, "\u{d800}" is valid in Lua 5.3, but not in LuaJIT.

Should Lua be more strict about Unicode errors?
Why should it be invalid? The `d` indicates that here should be
a codepoint of two bytes, and two bytes are given. Surely it depends
on the application, not the language, what to make of it. The utf8
section of the Lua  manual says:

This library provides basic support for UTF-8 encoding. It provides
all its functions inside the table utf8. This library does not provide
any support for Unicode other than the handling of the encoding.
Any operation that needs the meaning of a character, such as
character classification, is outside its scope.

It is not unreasonable for this rule to apply to \u too.

Remember that LuaJIT is not even 5.2 compliant, let alone 5.3.

But there's this surrogate thing in UTF-16...
http://unicode.org/faq/utf_bom.html#utf8-4
https://en.wikipedia.org/wiki/UTF-8#Invalid_code_points
etc

--
Disclaimer: these emails are public and can be accessed from <TODO: get a non-DHCP IP and put it here>. If you do not agree with this, DO NOT REPLY.