lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On 19 April 2014 14:00, Jay Carlson <nop@nop.com> wrote:
> As an aside, I like the demarcation point of "Lua does UTF-8, but it does
> not know Unicode." It is always good to be clear what you are *not* trying
> to do.

Agreed.

> If I had one wish for utf8.match, it would be for "." to either match
> complete utf8 characters or fail.
>
> ...but wait a minute, that’s exactly what the range [\0-\u10FFFF] means with
> Hisham's patch, right?

That's what "." already means in my patch (except that I didn't add
error checking, which should be there but shouldn't be too painful to
add).

I didn't venture into suggesting Lua interfaces on this subject yet,
but I think a utf8.match that does UTF-8 pattern handling by default
would be a great place to do this, Lua-API-wise (no compatibility
issues; no global state; no additional boolean argument in string
functions; it would be in the place where people would look for utf8
functions; string module remains byte-oriented). The patch
demonstrates that it could share most of the code with the
byte-oriented pattern matching C code, thus adding the feature without
having to have two whole engines in the core.

-- Hisham