Re: LPeg support for utf-8

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: LPeg support for utf-8
From: Roberto Ierusalimschy <roberto@...>
Date: Fri, 1 Apr 2011 15:28:10 -0300

> Roberto Ierusalimschy <roberto@inf.puc-rio.br> wrote:
> 
> > A quick survey, for those who care:
> > - should LPeg support utf-8?
> > - If so, what would that mean?
> 
> An alternative to lpeg.P(N) which matches N UTF-8 encoded code points
> instead of octets. Similarly, alternatives to lpeg.R and lpeg.S that deal
> with code points instead of octets. Maybe lpeg.uP and .uS and .uR ?
> Perhaps there should be a .uB as well. I would prefer this to a "unicode
> mode" which changes the behaviour of the existing funcctions.

This is more ore less what I had in mind (specific names not
withstanding). But still remains the question of whether each of these
constructions (uS, uR, etc.) is really useful and whether there should
be others. For instance, would it be worth to support something like
properties (using wctype)? Or a capture that matches one code point
and catures its value?

-- Roberto

Follow-Ups:
- Re: LPeg support for utf-8, Chris Babcock
- Re: LPeg support for utf-8, Tony Finch
- Re: LPeg support for utf-8, Thomas Harning Jr.

References:
- LPeg support for utf-8, Roberto Ierusalimschy
- Re: LPeg support for utf-8, Tony Finch

Prev by Date: Re: LPeg support for utf-8
Next by Date: Re: Possible bug with the length operator
Previous by thread: Re: LPeg support for utf-8
Next by thread: Re: LPeg support for utf-8
Index(es):
- Date
- Thread