Internationalisation in programming languages [Was Re: lex patch]

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Internationalisation in programming languages [Was Re: lex patch]
From: RLake@...
Date: Fri, 5 Apr 2002 16:21:46 -0500

Edgar Toernig escribió:

> Not only over nations.  Using locales is a user choice and some people
> (like me) always use the C locale and will get a lot of problems with
> code that uses "strange" identifiers.  Even trying to figure out what
> locale the author used may become difficult.  Converting these identifier
> may be impossible.  Switching to "his" locale isn't easy either.  So
> IMHO it's best to define a well defined subset of characters to be
> used for identifiers ([a-zA-Z_0-9]) and the problem will never occur.[1]

> [1] Btw, while identifiers respect your locale settings, numbers do not
> (0.1 vs 0,1).

Don't get me started on internationalisation (or internationalization if
you prefer).

Having programs which change behaviour depending on the computer's locale
setting is the bane of my life. I could list the number of support issues
which is created by the fact that MS Excel obeys the regional settings for
decimal indicator (0.1 vs 0,1) whereas a number of utilities which purport
to create comma-separated files don't, but rather are either hard-coded
English or Spanish style, depending on the prejudice of the author. And
it's not enough to say, well, they should also respect the locale setting:
they may be run on a different machine (say, a remote web server), which
doesn't have access to that information. (Furthermore, different Latin
countries do decimal indicators differently, and so do different Latinos.)
And that's just a start. There's also date formats, the fact that Windows,
at least, differentiates between decimal indicators in plain numbers and
decimal indicators in currency, and a hundred other traps waiting to
happen.

Speaking of currency, it never seems to have occurred to anyone that you
cannot take a monetary amount in, say, pounds sterling and convert it to,
say, Peruvian soles simply by changing the funny L that the Brits use into
S./
So people send us spreadsheets with monetary figures in them, and Excel
cheerfully devalues them about five-to-one,
that being the current exchange rate.

Anyway, getting back to the question at hand, identifiers. I can't think of
a good reason why "high-ascii" characters should not be valid identifiers.
If you don't want to use ñ or Ä in your variable names, don't use them --
but they have no other syntactic significance, so there is no harm in
including them. In fact, I'd go for something even more radical: if it's
not a reserved symbol and it prints, it's a valid identifier character. (To
allow syntactic space for future operators, I'd reserve some low-ascii
characters that are not currently operators.)

One day, and probably to our sorrow, we'll all be using Unicode to write
programs in. Unicode at least has a clear (if not straight-forward)
discussion of what consitutes an identifier character. It leaves so many
ambiguities open, though, that it terrifies me that it will someday be
adopted by well-meaning internationalists. (In addition to adding yet
another line-end character to the soup).

Leaving all that aside, there is no earthly reason why I shouldn't use
whatever character makes me feel good inside of an identifier, providing
that it's not being used by the language for some other purpose. Given that
I use iso-8859-1, why should I not use «foo» as an identifier? (perhaps I
like global variables to stand out.) Does this hurt anything? Is there any
reason why a language which otherwise doesn't even admit to the existence
of « and » as characters should nevertheless deny my ability to express
myself with them? It's true that if someone who uses ISO-8859-2 tries to
read the code, these characters will have transformed themselves into T and
t (with carons, I can't type that in iso-8859-1) but that should not
prevent the program from compiling and executing.

Follow-Ups:
- Re: Internationalisation in programming languages [Was Re: lex patch], Edgar Toernig

Prev by Date: Re: lex patch
Next by Date: Re: Internationalisation in programming languages [Was Re: lex patch]
Previous by thread: Re: lex patch
Next by thread: Re: Internationalisation in programming languages [Was Re: lex patch]
Index(es):
- Date
- Thread