[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8
- From: Hugo Musso Gualandi <hgualandi@...>
- Date: Sat, 07 Jul 2018 13:41:12 -0300
> By miracle, if you do not use the "wrong" unicode characters, LUA
> accept it, because UNICODE was made to be backward compatible with
> ASCII till some point
To be pedantic, the backwards compatibility is because of the utf-8
encoding, not because of Unicode. And that was on purpose, not by
miracle :)
> Note: Using the public unicode character database it's easy to handle
> all white space characters of unicode.
A full unicode character database takes multiple megabytes[1]. That is
dozens of times larger than the whole Lua interpreter is right now.
You would need to trim down the database, which would mean either a
restrictive "whitelist" of allowed characters (for example, different
whitespace is allowed but not chinese characters) or an overly
permissive system (for example, all characters are allowed in
identifiers, including non-alphabetical ones). I'm not sure either of
these are better than the ASCII status quo.
[1] http://apps.icu-project.org/datacustom/