|
On 01-May-17 07:28, Daurnimator wrote:
However, to reply to the issue at hand: are unicode classes wanted? i.e. should a unicode space such as U+2001 count as whitespace for token separation? Furthermore, what should be considered valid characters for identifiers? I guess we still want the rule "alpha followed by any number of alphanumeric"? Which Unicode standard do we want to pick? (You did realise unicode gets updated.... right?) We'd need a strategy to deal with updates (which rarely go well: see how people are still dealing with fallout from IDNA2003 => IDNA2008) Which brings us to the next problem: normalisation of identifiers. It would seem perplexing to many that the identifiers U+00C5 and U+0041 U+030A would refer to different variables. Even if you don't think normalisation should occur (like myself), then you'll at least have an easy mechanism for obfuscated code contests....
FWIW, in the '80s there were Italian versions of BASIC, but they luckily died out. I say "luckily" because language localization mean community fragmentation: you cannot search for ideas and solutions and you cannot cooperate with people around the world. Not to speak of library usability.
So I do not think localized identifiers are a great idea. I realize that the issue may be more acutely felt by Asian users, but it is always a balance between symbol comprehension and using universal symbols (even in case one does not understand their literal meaning: think of identifiers as pictograms).
You do not have to think in English to write "if...then...else", even if it could be very slightly helpful in the first learning steps. In fact, I knew almost nothing of English when I started programming and that did not hamper me in the least. But the benefits of using world-standardized identifiers were immense.
-- Enrico