[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: [ANN] Lua 5.3.0 (work2) now available
- From: Roberto Ierusalimschy <roberto@...>
- Date: Thu, 10 Apr 2014 10:38:51 -0300
> While character classes like "%g" would require the entire Unicode
> tables, what about patterns like this:
>
> utf8.match("\u{e4}", "[\u{e4}-\u{e6}\u{f3}-\u{f5}]")
>
> It wouldn't require Unicode tables but "just" UTF-8 support for the
> matching functions.
>
> Would that be possible without adding too much bloat? When I had to
> match codepoint ranges before, I had to use multiple patterns to match
> certain ranges of UTF-8 encodings.
I believe you are asking to add this kind of class into the
pattern-matching constructions in Lua. That would require some
non-trivial changes to the engine, as the whole engine would have to
be 'utf8' aware. For instance, a class repetition such as [aá]* could
not just count the number of bytes it matched, but would have to count
the number of characters. That is not compatible with the byte-oriented
behavior, so the engine would need two modes (or maybe two different
engines).
(That is a problem of the current simple implementation. For a more
powerful engine, such as LPeg, that already handles subexpressions,
it would be much easier.)
-- Roberto