[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8
- From: Alysson Cunha <alyssonrpg@...>
- Date: Sat, 7 Jul 2018 11:54:46 -0300
I am raising the question: Should the future Lua 5.5 have unicode support? Since a lot (a lot a lot) encoding issues were solved with unicode and we are observing an international trend for the utf-8 use....
In my opinion: unicode is the future (actually, unicode is already the present for the past years), and ASCII was developed in 1960. Today, it is an old and very limitted character encoding.....
I would love to see LUA keep up to date.
Em sáb, 2018-07-07 às 09:44 -0300, Alysson Cunha escreveu:
> Issue #1) ---- Character 160
> Lua 5.3 is not recognizing the character 160 / 0xA0
> (https://en.wikipedia.org/wiki/Non-breaking_space) as space inside
> code as space.
The slippery slope of Unicode-supporting language syntax is that once
you allow some non-ASCII characters there is a temptation to allow all
of them (for instance there are many other whitespace characters you
did not mention). This can be confusing (there are different characters
that look similar to each other) and also problematic to implement (the
large character-class tables would bloat the interpreter)
> When pasting a text from some browsers/text editors, the following
> text come to my code:
> "function(stream, contentType)"
> The space that separates "stream," and "contentType" is a Character
> 160, not Character 32.
This sounds like an issue with the browser / text editor.
> Issue #2) ----- UTF-8
> The same character #160 when encoded as UTF-8 becomes the 2 bytes
> 0xC2 0xA0.
> The 0xC2 character in ISO 8859-1 (Latin-1) codification is the
> character "Â". What?
That is just how UTF-8 and Latin-1 work. UTF-8 text can look all messed
up and full of "Ã©" and so on if your misconfigured software mistakenly
tries to interpret it as Latin-1. (Like the inf.puc-rio.br web servers
do all the time. Aaargh!)
Make sure that you configure things to properly display things as UTF-
8. For example if you are making a webpage make sure you add a <meta
charset="utf-8"> near the top of the HTML.
> In my app, I strongly advise the users to encode their .lua file as
> UTF-8 because all of my system function expects utf-8 coding as
> string parameter
Current versions of Lua are perfectly content with non-ASCII utf8-
encoded characters, as long as they only appear inside strings or
comments. Non-ASCII characters in other parts of the program result in
syntax errors, as you found out.
> UTF-8 is a growing trend for internationalization, and lua_load
> should have a parameter that force the engines handle the lua script
> content as utf-8 encoded. Another sollution is to create lua_loadutf8
This would effectly fork Lua into two versions of the language -- an
ASCII-only one and a full Unicode aware one. I'm not sure the
compatibility headache from that would be worth the hassle. IMO, if Lua
is ever to allow Unicode syntax it should be part of the default
language and not require a separate "load" function.
> or... lua_loadutf16 (since UTF-16 is used as standard in many
> programming languages)
UTF-16 is an awful standard, and is inferior to UTF-8 in pretty much
every way. Unfortunately we will need to live with it for a long time