[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: Could Lua itself become UTF8-aware?
- From: Daurnimator <quae@...>
- Date: Mon, 1 May 2017 15:28:49 +1000
On 1 May 2017 at 15:05, Dirk Laurie <email@example.com> wrote:
> 2017-04-30 23:11 GMT+02:00 Sean Conner <firstname.lastname@example.org>:
>> There was a long discussion about that a few years ago:
>> It appears the consensus then was "maybe not a good idea."
> I did not read that before posting, and Sean is careful not to imply that
> old issues are dead and buried,
> But UTF-8 has come closer to universal acceptance in the last three
> years. What was "maybe not a good idea" back then, might have
> become" maybe not a bad idea" now.
> -- Dirk
I don't think UTF-8's acceptance factor has changed at all in the last
~5-10 years: it has always been highly accepted outside of
interoperation with windows native APIs. Around 2005-2006 was when I
started hearing C# and Java devs wish they had UTF-8 instead.
However, to reply to the issue at hand: are unicode classes wanted?
i.e. should a unicode space such as U+2001 count as whitespace for
Furthermore, what should be considered valid characters for identifiers?
I guess we still want the rule "alpha followed by any number of alphanumeric"?
Which Unicode standard do we want to pick? (You did realise unicode
gets updated.... right?)
We'd need a strategy to deal with updates (which rarely go well: see
how people are still dealing with fallout from IDNA2003 => IDNA2008)
Which brings us to the next problem: normalisation of identifiers. It
would seem perplexing to many that the identifiers U+00C5 and U+0041
U+030A would refer to different variables.
Even if you don't think normalisation should occur (like myself), then
you'll at least have an easy mechanism for obfuscated code