[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Future plans for Lua and Unicode
- From: "Simon Orde" <sorde@...>
- Date: Fri, 6 Jul 2012 11:48:58 +0100
Thanks Roberto - that's very reassuring. When I first thought about
implementing UTF-8 in Lua, it seemed quite daunting. But it seems much less
daunting now. I said in an earlier post that allowing scripts to be written
in UTF-8 themselves seemed to me the perfect solution. OK that's probably
putting it a little strongly. You don't have to do it that way. But I
still think that it's a big plus about Lua that you can do it that way.
Bearing in mind that Lua is an embedded scripting language, and many of
these scripts will consequently be written by ordinary users, the fact that
they can simply enter accent characters into string literals and that this
will work exactly as they would expect has got to be a good thing.
There is still the issue of UTF-8 support in libaries. It would be great if
all Lua libraries always supported UTF-8 as a matter of course. I think it
would greatly strengthen the language. Then instead of saying "yes and no"
to the question "does Lua support Unicode?" (see
http://lua-users.org/wiki/LuaUnicode) the answer would be "Yes it provides
excellent support for Unicode - using UTF-8".
Simon
----- Original Message -----
From: "Roberto Ierusalimschy" <roberto@inf.puc-rio.br>
To: "Lua mailing list" <lua-l@lists.lua.org>
Sent: Thursday, July 05, 2012 8:27 PM
Subject: Re: Future plans for Lua and Unicode
Lua expects code to be in ASCII except inside string literals.
In long strings (delimited by [[â?¦]]) almost anything goes â?¦ with one
exception: end-of-line. Another system's EOL is converted to your
system's EOL.
That is precisely the behaviour you want when the long string contains
some UTF-8 text; I can't speak for UTF-16. But it does mean that the
only way of reliably entering *arbitrary* strings as literals is to
enter them as short strings with non-ASCII characters escaped.
Just to clarify: this is true for *arbitrary* (binary) strings. UTF-8
strings, not being arbitrary, can be used inside any kind of literal in
Lua ('...', "...", [[...]]) and also in comments without any problems.
-- Roberto