lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Adrian Perez wrote:
> Just my two cents. I would really appreciate Unicode support in Lua. I
> vote for enforcing UTF-8 as encoding for source files.

As an addendum to this: I've been pushing Lua here at Tao as a general-purpose
scripting language. One of our better developers had a look, and came to me
with comments. Most of these were favourable, but he was deeply unimpressed by
Lua's locale-specific identifier behaviour. This sentence from the manual:

| The definition of letter depends on the current locale: any character
| considered alphabetic by the current locale can be used in an identifier.

This means that one program will work on one machine and not work on another
apparently identical machine, for really obscure reasons. We've been bitten by
this before when using awk scripts and it's deeply vile.

I'm not looking for standardisation on UTF-8 in source, although I agree in
principle with the reasoning; I know that there are a number of Asian users
who need to use different encodings. What I'd rather see, though, is a clear
statement that *all* high-bit bytes are treated as valid in identifiers, and a
removal of the locale-specific behaviour for low-bit characters in favour of
fixed (and documented) tables.

This would allow UTF-8 or any other ASCII-with-extensions encoding to be used
safely in Lua scripts, regardless of encoding. By making high-bit characters
valid in identifiers, it means that they can be used safely in variable names.
 (Useful in contexts other than internationalisation; think maths and my use
of ℵ₀ and ℵ₁ further above.)

> Python is a
> somewhat hackish: it tries to detect encoding by using a special comment
> on the first 5 lines of code like '# -*- encoding: utf-8 -*-'. It works
> but I think it's quite awkward...

Particularly when the encoding's not compatible with ASCII!

╭─┈David Given┈──McQ─╮ "There are two major products that come out of
│┈┈┈┈┈┈│ Berkeley: LSD and Unix. We don't believe this to be
│┈(┈│ a coincidence." --- Jeremy S. Anderson

Attachment: signature.asc
Description: OpenPGP digital signature