[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: UTF-8 patterns in Lua 5.3
- From: Hisham <h@...>
- Date: Sat, 19 Apr 2014 16:07:28 -0300
On 19 April 2014 14:00, Jay Carlson <nop@nop.com> wrote:
> As an aside, I like the demarcation point of "Lua does UTF-8, but it does
> not know Unicode." It is always good to be clear what you are *not* trying
> to do.
Agreed.
> If I had one wish for utf8.match, it would be for "." to either match
> complete utf8 characters or fail.
>
> ...but wait a minute, that’s exactly what the range [\0-\u10FFFF] means with
> Hisham's patch, right?
That's what "." already means in my patch (except that I didn't add
error checking, which should be there but shouldn't be too painful to
add).
I didn't venture into suggesting Lua interfaces on this subject yet,
but I think a utf8.match that does UTF-8 pattern handling by default
would be a great place to do this, Lua-API-wise (no compatibility
issues; no global state; no additional boolean argument in string
functions; it would be in the place where people would look for utf8
functions; string module remains byte-oriented). The patch
demonstrates that it could share most of the code with the
byte-oriented pattern matching C code, thus adding the feature without
having to have two whole engines in the core.
-- Hisham