lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On 29/10/2019 07:09, Chris Smith wrote:

On 29 Oct 2019, at 05:45, bil til <flyer31@googlemail.com> wrote: I
think really main target are the Chinese. The "standard Chinese"
really need their writing very much, as the languages in China are
extremely different in speaking, but the language unites them -
this has been in China like this for many 1000 years... . Further
from my China travels I have the impression that the basic Chinese
writing "for daily use" is extremely efficient, takes typically
much less space than western phonetic writing.

It is very spatially efficient for relatively long words because it
is very graphically complex, but quite inefficient for short words
due to the ambiguity of each character — a character normally
represents a concept and you typically need at two characters to form
a defined word. In terms of human input it is less efficient than
romanised languages, since the only practical way of entering
characters is to type the romanised form and manually select the
correct characters from a list.

Its graphical complexity alone makes it inappropriate for this use,
in my view. I’ve spent hours debug COBOL that wasn’t working because
a full stop was in the wrong place; I can’t imagine the pain of
debugging Lua code that isn’t working because the character for a
variable in one place has the wrong radical, making it an entirely
different variable!

THIS! (+100)

I've already expressed my point in the past when someone brought up this point of allowing Unicode support in identifier.

It would be a maintenance/debugging nightmare. No offence intended to people from country using non-latin languages (I should say non-ASCII, since very few latin languages can be written orthographically correctly with just ASCII, Italian included). Sorry, it is not cultural discrimination: Unicode wasn't ever thought for programming, but for typography.

Programming and best programming practices are about non-ambiguous communication. Most source code is read much more than it is (re)written/modified.


The example I always make is this: I've learnt when I was very young to avoid ambiguous ASCII characters in variables (e.g. "l" mistaken for "1", "O" mistaken for "0", etc.), unless in words where it would be very clear what that glyph was meant to be.

Imagine then having Unicode at hand, where there are probably a dozen of code points which, in some font, have glyph similar to "0" (zero)! If this doesn't bring up nightmares to any programmer.

Although ASCII could "reek" of Anglo-centrism, it is indeed something like a least common denominator for expressing words in any language. Even Asian languages, which have probably the most complex writing systems, can be transliterated into ASCII in a decent way (as was written some time ago by a Chinese programmer, whose name I can't recall, in an answer to one of my post some time ago on this list.).

Yes, this wouldn't do justice to a literary composition in that language, but in a program the focus is not that of being politically correct, but of being effective.

Moreover, English has become the de-facto standard language for technology. I even teach my students to try to write their code comments in English as a best practice. Even if an entire application
is written by, say, Italian developers and it is not meant to be read
by anyone else, you really never know. The need might arise in the future to sell the source or to hire a consultant or an expert developer that doesn't know Italian. Comments in Italian would backfire in this case, let alone Italian identifiers.

But I admit that being able to put comments in any language could be good at times, especially if the information in the comments is linguistically relevant for the code at hand.

So I'm all opposed to Unicode allowed in the text of a program.
On the other hand, I'd welcome full Unicode support in strings and comments. That would be great for i18n and related things. But probably it could bloat the engine. An optional standard library would be the best of both world probably (i.e. an utf8 library that supported any meaningful Unicode operation), but I'm digressing.

So TL;DR:

Please! Lua Team, do not ever allow anything more than ASCII in identifiers!


Chris


Cheers!

-- Lorenzo