[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: LPeg support for utf-8
- From: Roberto Ierusalimschy <roberto@...>
- Date: Fri, 1 Apr 2011 15:17:02 -0300
> Hi Roberto,
>
> > A quick survey, for those who care:
> > - should LPeg support utf-8?
> > - If so, what would that mean?
>
> I love LPeg but don't see anything useful it could do with UTF-8 that it
> doesn't already do. LPeg already handles parsing UTF-8 fine (for those
> who don't know: UTF-8 is a superset of ASCII). Any built-in "magic"
> would only reduce the flexibility for users of LPeg, unless you're
> considering an add-on module like "re". That would be fine of course but
> I don't really see the need for it, given that modules like slnunicode
> are available.
The support for UTF-8 would not change current "byte-oriented"
behavior. I am thinking more in terms of extra build-in patterns.
So, for instance, lpeg.utf8.point(n) would match n UTF-8 code points,
lpeg.utf8.set("...") would match any point present in the given string,
and lpeg.utf8.range(v1,v2) would match any point with a code between v1
and v2.
-- Roberto