lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On 01-May-17 07:28, Daurnimator wrote:
However, to reply to the issue at hand: are unicode classes wanted?
i.e. should a unicode space such as U+2001 count as whitespace for
token separation?
Furthermore, what should be considered valid characters for identifiers?
I guess we still want the rule "alpha followed by any number of alphanumeric"?
Which Unicode standard do we want to pick? (You did realise unicode
gets updated.... right?)
We'd need a strategy to deal with updates (which rarely go well: see
how people are still dealing with fallout from IDNA2003 => IDNA2008)

Which brings us to the next problem: normalisation of identifiers. It
would seem perplexing to many that the identifiers U+00C5 and U+0041
U+030A would refer to different variables.
Even if you don't think normalisation should occur (like myself), then
you'll at least have an easy mechanism for obfuscated code
contests....
FWIW, in the '80s there were Italian versions of BASIC, but they luckily 
died out. I say "luckily" because language localization mean community 
fragmentation: you cannot search for ideas and solutions and you cannot 
cooperate with people around the world. Not to speak of library usability.
So I do not think localized identifiers are a great idea. I realize that 
the issue may be more acutely felt by Asian users, but it is always a 
balance between symbol comprehension and using universal symbols (even 
in case one does not understand their literal meaning: think of 
identifiers as pictograms).
You do not have to think in English to write "if...then...else", even if 
it could be very slightly helpful in the first learning steps. In fact, 
I knew almost nothing of English when I started programming and that did 
not hamper me in the least. But the benefits of using world-standardized 
identifiers were immense.
--
  Enrico