lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


* Miles Bader:

> Florian Weimer <fw@deneb.enyo.de> writes:
>>> I must confess I am currently stuck. I think LPEG should support Unicode
>>> (through UTF-8), but I have no idea what "to support Unicode" means :)
>>
>> P(1) needs to turn into
>
> It seems there needs to be a clear distinction between "raw char" (given
> that lpeg is quite usable for binary data) and "unicode char".
>
> Making P(x) count utf8 chars would certainly be convenient for people
> reading utf8 files, but... it doesn't seem the cleanest thing in
> general....

Sure, this has to be optional.

By the way, I'm not sure if it is reasonably possible to implement
something like grapheme cluster matching without special bytecode
support.  Right now, the compiled program would be fairly large, I
fear, and there would be a rather long sequences of choices.