lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


David Given wrote:
> What I'd rather see, though, is a clear
> statement that *all* high-bit bytes are treated as valid in identifiers, and a
> removal of the locale-specific behaviour for low-bit characters in favour of
> fixed (and documented) tables.

I second this. Locale-dependent lexing is bad. The above rule is
both simple and effective. Please let's avoid the Java mess.

I.e. an identifier matches: /[A-Za-z_\x80-\xff][0-9A-Za-z_\x80-\xff]*/

Replacing isdigit, isalnum, isalpha, isspace, iscntrl in llex.c
should suffice. Overhead: a 257 (*) byte read-only table holding the
bitmasks. You can speed it up if you fold in the checks for '_'
and '.'. It should be faster anyway, because the NLS-aware libc
ctype macros need a function call which isn't always optimized
away by the compiler.

(*) EOF (-1) must be handled, too.