[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: lpeg v0.4
- From: "Duncan Cross" <duncan.cross@...>
- Date: Thu, 18 Jan 2007 21:05:19 +0000
All right, here's my latest attempt - a pattern to turn an arbitrary
string into XML text encoding any problematic characters into
entities, and one to decode entity-encoded text. (My first effort was
to write a whole XML parser, but that didn't quite work out, so I've
scaled back a bit.)
Is there any sort of "null pattern" I could initialize encodedEntity
and decodedEntity to (i.e. nullpattern + pattern == pattern) so I
wouldn't need the if block in the for loop? Also, is that #lpeg.P("&")
in xmlTextDecode good practice or unnecessary? Any other comments
about how I'm doing this?
-- named entities
local encodedEntity, decodedEntity
local entities = {amp="&", lt="<", gt=">", apos=[[']], quot=[["]]}
for entityname,character in pairs(entities) do
local c2e = lpeg.P(character) / ("&"..entityname..";")
local e2c = lpeg.P("&"..entityname..";") / character
if (encodedEntity) then
encodedEntity, decodedEntity = encodedEntity + c2e, decodedEntity + e2c
else
encodedEntity, decodedEntity = c2e, e2c
end
end
-- characters that require no encoding/decoding
local directChar = lpeg.R(" ~") + lpeg.S("\n\r\t")
-- arbitrary characters encoded numerically (e.g. ? @)
local decdigit = lpeg.R("09")
local hexdigit = decdigit + lpeg.R("AF") + lpeg.R("af")
local function stringcharx(c) return string.char('0'..c) end
local encodedNumEntity = lpeg.P(1) / function(c) return
"&#"..string.byte(c)..";" end
local decodedNumEntity = ("&#" * lpeg.C(decdigit^1 + "x"*hexdigit^1) *
";") / stringcharx
-- final patterns
xmlTextEncode = lpeg.Cs( (encodedEntity + directChar + encodedNumEntity)^0 )
xmlTextDecode = lpeg.Cs( (#lpeg.P("&")*(decodedEntity+decodedNumEntity) + 1)^0 )
-- test
print(lpeg.match(xmlTextEncode, "abcABC 'hello' 012345
"..string.char(0).." & < >"))
print(lpeg.match(xmlTextDecode, "<<<Lua>>>"))
On 1/17/07, Roberto Ierusalimschy <roberto@inf.puc-rio.br> wrote:
Yet another version of LPeg. The main novelty is a static check for
loops, both in repetitions and in grammars, that avoids the creation of
patterns with infinite loops.
Besides that, as announced, I removed the label option in captures and
changed the order of arguments to match. The metatable was already
set, so we can write patt:match(subject).
The implementation of captures also uses less memory (half, in some
relevant cases).
I hope it will be more stable from now on.
-- Roberto