lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Le vendredi 14 juin 2013 00:29:29 Pierre-Yves Gérardy a écrit :
> On Thu, Jun 13, 2013 at 11:36 PM, Jay Carlson <nop@nop.com> wrote:
> > On Jun 12, 2013, at 9:53 AM, Pierre-Yves Gérardy wrote:
> > Don't forget getchar(S, 2) -> error("not defined at position 2").  I
> > really like Julia's idea of strings as partial functions.
> I'd prefer getchar(S, 2) --> false, 3.
> 
> >> A similar function could return code points instead of strings.
> > 
> > Would you use that much?
> 
> Yes, before I broke Unicode support in LuLPeg, that's what I was
> using. It allows to check if a character is in a given range, and it
> is barely slower than returning a sub-string (doing the conversion in
> Lua). In LuaJIT, computing the code point with standard arithmetic
> (mod, division and floor) is faster than getting the sub-string. It
> should be even faster by using the bit library.
> 
> > Miles Bader pointed out a lot of string iteration code is phrased in terms
> > of gmatch--or should be. And in that case, there are no string positions
> > at all.
> Well, in my case, it isn't, but an LPeg clone is probably not usual in
> terms of string processing.
> 
> > The major problem for UTF-8 then would be convincing the pattern matcher
> > to consume an entire UTF-8 sequence for ".".
> In the 2012 Workshop presentation, Roberto talks about deprecating the
> old patterns, so unicode in gmatch will probably never see the light
> of day... I don't know if/how he plans to handle Unicode in LPeg.
> 

Please do not, it will break scripts again. This is a very big breakage. The 
current patterns are very great and I would like to keep them as they are very 
simple.

> As posted in the other thread, I plan to tackle this in LuLPeg with
> P8(), R8() and S8(), that will live alongside their byte-matching
> cousins.
> 
> -- Pierre-Yves