lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Sun, Jan 24, 2016 at 9:17 AM, Jay Carlson <nop@nop.com> wrote:
> Thank you for following up on this. It builds cleanly on OS X 10.11 Xcode
> 7.2, although I haven't really tested it. (Add a test suite to the wish list.
> :-)

Yes, I plan to add a test suite before I go too far with this. Probably, it
will be set up for automatic testing via Travis CI (and possibly Appveyor for
Windows testing as well).

> IMO, the most important feature for a Unicode pattern matcher is for "." and
> inverted ranges to match either whole codepoints or nothing. I think that
> would mean: given valid utf8 patterns and haystacks, no invalid utf8 can be
> output. "First, do no harm..."

The main obstacles to UTF-8 pattern matching are for ".", sets, and quantifiers
to match on codepoints instead of bytes. Everything else works "as-is". I have
no intention of making classes like "%a" match on the Unicode meaning; that
would be far too complicated and bloated.

> I think the slnunicode package has a Unicode-aware pattern matcher, if you
> want other ideas.

I'll look into that. Thanks for the tip.