lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Just to clarify for you, the libraries I'm talking about are the ones you need to understand what unicode characters really are. UTF8 is about codepoints, which are numbers, and thus you cannot just say we will allow any sequence of codepoints not currently excluded to be an identifier: you will include many language's whitespace and punctuation. An example of this is the mongolian vowel separator, which can cause problems and has bitten languages in the past[1]. Thus you need to follow the proposal[2] mentioned in Philippe's reply which needs a library to be able to manipulate unicode data, the raw data file[3] for the unicode manipulation library julia uses is over half the size of stock lua 5.3.1 on my PC!


On Mon, Nov 4, 2019 at 9:50 AM bil til <> wrote:
Hi Marcus,
but I would like to include Unicode ONLY for variable names (and of course
for string contents, but therefore it is included already in lua). As I
understand it, this usually would NOT touch the basic lua texting, nor the
libraries if I understand this correctly.

(I assume if in a library a variable name is used in string form, it is just
a zero-terminated string, but this keeps the same if UTF8 is allowed).
(somtimes you would possibly use tolower or toupper with such variable
names, but this tolower and toupper then of course will operate only on the
ASCII chars, these 2 functions leave the non-ascii bytes all untouched
(Unicode-UTF8-Charpoints only have bytes in the range 0x80...0xFF, those
bytes are NOT touched by toupper / to lower)).

... or maybe I did not understand your post correctly ... in this case could
you show some short example?

Sent from: