Re: Will Lua kernel use Unicode in the future?

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Will Lua kernel use Unicode in the future?
From: Jens Alfke <jens@...>
Date: Fri, 30 Dec 2005 10:32:05 -0800

(I've replied to a bunch of messages here rather than sending out sixseparate replies...)


On 29 Dec '05, at 9:17 AM, Chris Marrin wrote:

It allows you to add "incidental" characters without the need for afully functional editor for that language. For instance, when Iworked for Sony we had the need to add a few characters of Kanji onoccasion. It's not easy to get a Kanji editor setup for a westernkeyboard, so adding direct unicode was more convenient. There arealso some oddball symbols in the upper registers for math andchemistry and such that are easier to add using escapes.

Also, in some projects there are guidelines that discourage the useof non-ascii characters in source files (due to problems witheditors, source control systems, or other tools. In these situationsit's convenient to be able to use inline escapes to specify non-asciicharacters that commonly occur in human-readable text ... exampleswould include ellipses, curly-quotes, emdashes, bullets, currencysymbols, as well as accented letters of course.



whisper@oz.net wrote:

IMO, with globalization, languages that don't support Unicode won'tmake the cut in the long run.

I find it ironic that the three non-Unicode-savvy languages I use(PHP, Ruby, Lua) all come from countries whose native languages usenon-ascii characters :)

I'm keeping an eye on this thread because, if I end up using Lua forany actual work projects, I18N is mandatory and I can't afford to runinto any walls when it comes time to make things work in Japanese orThai or Arabic. (Not like last time...See below for the problems I'vehad with JavaScript regexps.)



lisa@thecommune.org.uk wrote:

I've done an awful lot of work with international character sets,and I personally consider any encoding of Unicode other than UTF-8to be obsolete. Looking at most other modern (i.e. not held back bybackwards compatibility, e.g. Windows) users of Unicode, theirauthors appear to feel similarly.

Yes, at least as an external representation. Internally, it can beconvenient to use a wide representation for speed of indexing intostrings, but a good string library should hide that from the clientas an implementation detail. (Disclaimer: I'm mostly knowledgeableonly about Java's and Mac OS X's string libraries.)



mikelu-0512@mike.de wrote:

Most apps can just treat strings as opaque byte streams.

I agree, mostly; the one area I've run into problems has been withregexp/pattern libraries. Any pattern that relies on "alphanumericcharacters" or "word boundaries" assumes a fair bit of Unicodeknowledge behind the scenes. I ran into this problem whenimplementing a live search field in Safari RSS, since while JSstrings are Unicode-savvy, many implementations' regexpimplementations aren't, so character classes like "\w" or "\b" onlywork on ascii alphanumerics.



andras.balogh@gmx.net wrote:

These kinds of problems should be solved at a different level, nothacked into Lua. The beautiful thing about Lua is that it's reallyclean ANSI C code

...except for the parts that aren't, like library loading, and theextension libraries for sockets, databases, etc. I agree that havinga portable core runtime is important, but there should be some kindof standard extension for Unicode strings, hopefully one that cleanlyextends the built-in string objects using something like ICU.


--Jens

Follow-Ups:
- Re: Will Lua kernel use Unicode in the future?, Mike Pall
- Re: Will Lua kernel use Unicode in the future?, Klaus Ripke
- Re: Will Lua kernel use Unicode in the future?, Chris Marrin

References:
- Re: Will Lua kernel use Unicode in the future?, Roberto Ierusalimschy
- Re: Will Lua kernel use Unicode in the future?, Chris Marrin

Prev by Date: RE: How do I check for errors returned from a C function?
Next by Date: Re: How do I check for errors returned from a C function?
Previous by thread: Re: Will Lua kernel use Unicode in the future?
Next by thread: Re: Will Lua kernel use Unicode in the future?
Index(es):
- Date
- Thread