lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Thanks Roberto - that's very reassuring. When I first thought about implementing UTF-8 in Lua, it seemed quite daunting. But it seems much less daunting now. I said in an earlier post that allowing scripts to be written in UTF-8 themselves seemed to me the perfect solution. OK that's probably putting it a little strongly. You don't have to do it that way. But I still think that it's a big plus about Lua that you can do it that way. Bearing in mind that Lua is an embedded scripting language, and many of these scripts will consequently be written by ordinary users, the fact that they can simply enter accent characters into string literals and that this will work exactly as they would expect has got to be a good thing.

There is still the issue of UTF-8 support in libaries. It would be great if all Lua libraries always supported UTF-8 as a matter of course. I think it would greatly strengthen the language. Then instead of saying "yes and no" to the question "does Lua support Unicode?" (see http://lua-users.org/wiki/LuaUnicode) the answer would be "Yes it provides excellent support for Unicode - using UTF-8".

Simon

----- Original Message ----- From: "Roberto Ierusalimschy" <roberto@inf.puc-rio.br>
To: "Lua mailing list" <lua-l@lists.lua.org>
Sent: Thursday, July 05, 2012 8:27 PM
Subject: Re: Future plans for Lua and Unicode


Lua expects code to be in ASCII except inside string literals.

In long strings (delimited by [[â?¦]]) almost anything goes â?¦ with one
exception: end-of-line.  Another system's EOL is converted to your
system's EOL.

That is precisely the behaviour you want when the long string contains
some UTF-8 text; I can't speak for UTF-16. But it does mean that the
only way of reliably entering *arbitrary* strings as literals is to
enter them as short strings with non-ASCII characters escaped.

Just to clarify: this is true for *arbitrary* (binary) strings. UTF-8
strings, not being arbitrary, can be used inside any kind of literal in
Lua ('...', "...", [[...]]) and also in comments without any problems.

-- Roberto