lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


2017-04-29 15:21 GMT+02:00 Roberto Ierusalimschy <roberto@inf.puc-rio.br>:
>> At present all the entries from 0x80 to 0xFF in the constant array
>> luai_ctype in lctype.c are zero: no bit set.
>>
>> There are three unused bits. Couldn't two of them be used to mean
>> UTF8_FIRST and UTF8_CONT?
>>
>> This is only the first step, but if the idea is shot down here already,
>> the others need not be mentioned.
>
> This particular idea has very low cost, so I don't see why to shot it
> down before knowing the rest of the story. What does it mean for Lua
> to be "UTF-8 aware"?
>
> -- Roberto

The next step would be a compiler option under which the lexer
accepts a UTF-8 first character followed by the correct number
of UTF-8 continuation characters as being alphabetic for the
purpose of being an identifier or part of one.