[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: LPeg support for utf-8
- From: Roberto Ierusalimschy <roberto@...>
- Date: Fri, 1 Apr 2011 15:28:10 -0300
> Roberto Ierusalimschy <roberto@inf.puc-rio.br> wrote:
>
> > A quick survey, for those who care:
> > - should LPeg support utf-8?
> > - If so, what would that mean?
>
> An alternative to lpeg.P(N) which matches N UTF-8 encoded code points
> instead of octets. Similarly, alternatives to lpeg.R and lpeg.S that deal
> with code points instead of octets. Maybe lpeg.uP and .uS and .uR ?
> Perhaps there should be a .uB as well. I would prefer this to a "unicode
> mode" which changes the behaviour of the existing funcctions.
This is more ore less what I had in mind (specific names not
withstanding). But still remains the question of whether each of these
constructions (uS, uR, etc.) is really useful and whether there should
be others. For instance, would it be worth to support something like
properties (using wctype)? Or a capture that matches one code point
and catures its value?
-- Roberto