lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Wed, Feb 8, 2012 at 2:04 AM, Vaughan McAlley <vaughan@mcalley.net.au> wrote:
> On 8 February 2012 08:40, Jay Carlson <nop@nop.com> wrote:
>>> The main question I suppose is:  is the resulting user code, using
>>> mostly ordinary string functions plus a little minimal utf8 tweaking,
>>> going to be significantly uglier/harder-to-maintain/confusing, to the
>>> point where using a heavier-weight abstraction might be worthwhile?
>>>
>>> My suspicion is that for most apps, the answer is no...
>>
>> Well, that certainly makes Roberto happy.
>
> The Lua team would have handled a lot of Portuguese strings by now. If
> it gave them issues LHF surely would have written a library by now...

The unfortunate Tower of Babel incident left us with a lot of
different writing systems, many of which do not fit into 8-bit bytes,
and certainly not at once. Much of the EU may have (had?) a common
currency, but it does not have a common single-byte character set.

Many of the non-gray parts of
http://en.wikipedia.org/wiki/List_of_writing_systems are
well-populated, and many are computerizing and joining the Internet,
although sometimes haltingly.
http://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal) may be
interesting; the PPP tables finally show Brazil within inches of
passing the UK! But computers are already pervasive in Japan and the
Republic of Korea too, and the People's Republic of China is on its
way--and the CJK written languages are difficult to handle in
unextended Lua. Some people would argue many aspects of CJK are fairly
easy by now compared to some other scripts....

I'm going to put your skepticism a different way: is processing text
in languages covering n% of, say, Internet population is a goal for
Lua? When Lua was young, it was easier to choose an n such that text
and sequences of bytes shared structure. It's getting harder.

There is a separate question of "OK, so then what?" and perhaps there
is not a good answer. Unicode is not the ideal basis for any
particular script, for starters.

Jay