Re: [ANN] Lua 5.3.0 (work2) now available

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: [ANN] Lua 5.3.0 (work2) now available
From: Roberto Ierusalimschy <roberto@...>
Date: Thu, 10 Apr 2014 10:38:51 -0300

> While character classes like "%g" would require the entire Unicode
> tables, what about patterns like this:
> 
>     utf8.match("\u{e4}", "[\u{e4}-\u{e6}\u{f3}-\u{f5}]")
> 
> It wouldn't require Unicode tables but "just" UTF-8 support for the
> matching functions.
> 
> Would that be possible without adding too much bloat? When I had to
> match codepoint ranges before, I had to use multiple patterns to match
> certain ranges of UTF-8 encodings.

I believe you are asking to add this kind of class into the
pattern-matching constructions in Lua. That would require some
non-trivial changes to the engine, as the whole engine would have to
be 'utf8' aware. For instance, a class repetition such as [aá]* could
not just count the number of bytes it matched, but would have to count
the number of characters. That is not compatible with the byte-oriented
behavior, so the engine would need two modes (or maybe two different
engines).

(That is a problem of the current simple implementation. For a more
powerful engine, such as LPeg, that already handles subexpressions,
it would be much easier.)

-- Roberto

References:
- Re: [ANN] Lua 5.3.0 (work2) now available, David Heiko Kolf

Prev by Date: Re: Bug: __index returns truncated to one.
Next by Date: Suggestion: left hand reference operator in assignment
Previous by thread: Re: [ANN] Lua 5.3.0 (work2) now available
Next by thread: Re: [ANN] Lua 5.3.0 (work2) now available
Index(es):
- Date
- Thread