lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Roberto Ierusalimschy wrote:
I have released a new version of LPeg (0.5). The main changes are
several optimizations, which should make LPeg much faster for several
common tasks.

(On the other hand, these optimizations make patterns
less regular, and therefore more difficult to test...)

Mike Pall wrote:
I would have picked the low-hanging fruit first:
- Remove the s<e test from IChar for non-NUL chars
  and add ICharZ which checks for s<e and NUL.
- Merge 2 or 4 successive IChars to IChar2 and IChar4.
- Let IAny check for more than one char.

Interesting. I haven't seen the code yet, but it seems to go in the way I chose for my own library... Which is of course, the way shown by Roberto, that I tried to push further.

Let me explain: I am working for some time now on specifications (no code yet!) of a Lua-independent Peg library in C. I chose to stick to Bryan Ford's more classical syntax, using a textual format, but changing some rules and adding some operators (just syntactic sugar, language is still regular).

Advantage: a more familiar look, not restricted by Lua's set of overridden operators and precedence rules. Inconveniences: less flexibility, and I can't rely on Lua code to parse (I will use the VM for that, of course) or to store captures. The purpose is to allow embedding in other languages (Lua might be a first target!) or in other programs (text editor, search/replace utility...)

Of course, I had a close look at Roberto's engine, which was very educative, both on ideas of implementation and on C optimization.

Dumping simple expressions shown the purpose of the opcodes, and finally made me understand what were the semi-cryptic notations on top of the code (implementation of operators in opcodes).

So I went ahead and created my own opcodes, reusing most of Roberto's ideas and creating new codes to make some common expressions optimal. I understand that's what Roberto did in the new version.

I have current around 25 opcodes, and I was even shy to create some others: my OP_CHARS can handle one or two chars, at the cost of comparison of a flag to 0. It shouldn't be too costly as single (or pairs of) chars aren't that common in pure repetitive parts of Pegs (ie. we rarely write 'a'+ in real expressions) and other repetitive operations like (!'c' .)* to reach a char in a non-greedy way has its own opcode which iterates independently of the VM (ie. it has its own internal loop, it doesn't loop on opcodes).
That's why I see Mike's comments with interest...

On the other hand, I couldn't resist and made opcodes to loop on patterns, implementing the repetition functionality I wished (I don't have a language to build Pegs anyway): I know it would be faster to internally repeat the patterns, but I chose a more compact code, at an assumed little performance cost.

Anyway, I am finishing the specification (which is also user's manual, design document, etc.) and I will start coding soon. I still have to do more work on capture design...

It is a bit early, but if anybody express an interest, I can make the document (some 1300 lines of pure text...) available, Having early remarks might help... :-)

Oh, well, it is there:
http://www.autohotkey.net/~PhiLho/Docs/PegTop.txt
Any comment should be private, I think, unless it is on topic with Lua.

--
Philippe Lhoste
--  (near) Paris -- France
--  http://Phi.Lho.free.fr
--  --  --  --  --  --  --  --  --  --  --  --  --  --