[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: identifier char class
- From: spir <denis.spir@...>
- Date: Fri, 27 Nov 2009 21:36:20 +0100
I have an issue understanding the following --from Lua ref man, in the section specifying identifier format:
<< The definition of letter depends on the current locale: any character considered alphabetic by the current locale can be used in an identifier. >>
1) Where does Lua take this info from?
I guess a system routine informs about current set and order of "word characters" (rather than "alphabetic char" or "letter"). Or is it taken from unicode --metadata or data bank?
2) what does "locale" actually mean here?
My system's "encoded character set" is full unicode. All apps use utf8 as character encoding. Language is en/GB (for accurate doc reason). Format is fr/FR (for output of date and such. Eg a date displays in english, but with the month at the proper place ;-)
With these data, I have no clue what characters are supposed to be allowed on my computer. Char set, encoding, language, or what else is the criterion?
I would guess in my case any character defined as "word character" by unicode is right -- but I'm not sure of this at all.
3) What is a "word character"?
In many scripts (http://en.wikipedia.org/wiki/Writing_system), there is nothing like "alphabetical character" or even less "letter". There are base syllabs, abjads, logographs, etc... plus many other signs that can compare to diacritics, binding or separating characters, semantic or grammatical annotations. Conversely, if such signs are allowed, then in english for instance "-" or "'" (or rather the real apostroph U+2019) should be allowed, too. I'm confused about all of this issue.
While I find great, really, Lua allowing to code using one's own language, I don't understand why this feature should be computer-locale dependant. I guess instead this should uniformly and basically apply to Lua, whatever a single programmer's native language.
The obvious flaw is a program becoming invalid when passed to another programmer, or even only when transfered to another computer with different setting (I guess in some cases a language variant may be enough -- don't know about pt-PT <--> pt-BR).
Another issue is it yields confusion --as is in my case.
To provide for consistency, Lua should:
* Either take input from a full character data bank (unicode) to check whether a char is allowed regardless of locale.
* Or define allowed chars negatively (not a digit, not any other sign used by Lua: sep, delim, op,...).
To both keep things simple and provide for extension: [!\x00-\x40\x5b-\x60\x7b\-\x7f]. Id est no ascii char except US letters.
This makes the definition of identifier char class rather simple, consistent, and user-friendly.
Maybe more clever solutions are possible; anyway I really find the present one weird, unpracticle & confusing. Or maybe I simply don't understand how it _actually_ works.
la vita e estrany