lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

(I've replied to a bunch of messages here rather than sending out six separate replies...)

On 29 Dec '05, at 9:17 AM, Chris Marrin wrote:

It allows you to add "incidental" characters without the need for a fully functional editor for that language. For instance, when I worked for Sony we had the need to add a few characters of Kanji on occasion. It's not easy to get a Kanji editor setup for a western keyboard, so adding direct unicode was more convenient. There are also some oddball symbols in the upper registers for math and chemistry and such that are easier to add using escapes.

Also, in some projects there are guidelines that discourage the use of non-ascii characters in source files (due to problems with editors, source control systems, or other tools. In these situations it's convenient to be able to use inline escapes to specify non-ascii characters that commonly occur in human-readable text ... examples would include ellipses, curly-quotes, emdashes, bullets, currency symbols, as well as accented letters of course. wrote:
IMO, with globalization, languages that don't support Unicode won't make the cut in the long run.

I find it ironic that the three non-Unicode-savvy languages I use (PHP, Ruby, Lua) all come from countries whose native languages use non-ascii characters :)

I'm keeping an eye on this thread because, if I end up using Lua for any actual work projects, I18N is mandatory and I can't afford to run into any walls when it comes time to make things work in Japanese or Thai or Arabic. (Not like last time...See below for the problems I've had with JavaScript regexps.) wrote:
I've done an awful lot of work with international character sets, and I personally consider any encoding of Unicode other than UTF-8 to be obsolete. Looking at most other modern (i.e. not held back by backwards compatibility, e.g. Windows) users of Unicode, their authors appear to feel similarly.

Yes, at least as an external representation. Internally, it can be convenient to use a wide representation for speed of indexing into strings, but a good string library should hide that from the client as an implementation detail. (Disclaimer: I'm mostly knowledgeable only about Java's and Mac OS X's string libraries.) wrote:
Most apps can just treat strings as opaque byte streams.

I agree, mostly; the one area I've run into problems has been with regexp/pattern libraries. Any pattern that relies on "alphanumeric characters" or "word boundaries" assumes a fair bit of Unicode knowledge behind the scenes. I ran into this problem when implementing a live search field in Safari RSS, since while JS strings are Unicode-savvy, many implementations' regexp implementations aren't, so character classes like "\w" or "\b" only work on ascii alphanumerics. wrote:
These kinds of problems should be solved at a different level, not hacked into Lua. The beautiful thing about Lua is that it's really clean ANSI C code

...except for the parts that aren't, like library loading, and the extension libraries for sockets, databases, etc. I agree that having a portable core runtime is important, but there should be some kind of standard extension for Unicode strings, hopefully one that cleanly extends the built-in string objects using something like ICU.