lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


All right, here's my latest attempt - a pattern to turn an arbitrary
string into XML text encoding any problematic characters into
entities, and one to decode entity-encoded text. (My first effort was
to write a whole XML parser, but that didn't quite work out, so I've
scaled back a bit.)

Is there any sort of "null pattern" I could initialize encodedEntity
and decodedEntity to (i.e. nullpattern + pattern == pattern) so I
wouldn't need the if block in the for loop? Also, is that #lpeg.P("&")
in xmlTextDecode good practice or unnecessary? Any other comments
about how I'm doing this?



-- named entities
local encodedEntity, decodedEntity
local entities = {amp="&", lt="<", gt=">", apos=[[']], quot=[["]]}
for entityname,character in pairs(entities) do
	local c2e = lpeg.P(character) / ("&"..entityname..";")
	local e2c = lpeg.P("&"..entityname..";") / character
	if (encodedEntity) then
		encodedEntity, decodedEntity = encodedEntity + c2e, decodedEntity + e2c
	else
		encodedEntity, decodedEntity = c2e, e2c
	end
end

-- characters that require no encoding/decoding
local directChar = lpeg.R(" ~") + lpeg.S("\n\r\t")

-- arbitrary characters encoded numerically (e.g. &#x3f; &#64;)
local decdigit = lpeg.R("09")
local hexdigit = decdigit + lpeg.R("AF") + lpeg.R("af")
local function stringcharx(c) return string.char('0'..c) end

local encodedNumEntity = lpeg.P(1) / function(c) return
"&#"..string.byte(c)..";" end
local decodedNumEntity = ("&#" * lpeg.C(decdigit^1 + "x"*hexdigit^1) *
";") / stringcharx

-- final patterns
xmlTextEncode = lpeg.Cs( (encodedEntity + directChar + encodedNumEntity)^0 )
xmlTextDecode = lpeg.Cs( (#lpeg.P("&")*(decodedEntity+decodedNumEntity) + 1)^0 )

-- test
print(lpeg.match(xmlTextEncode, "abcABC 'hello' 012345
"..string.char(0).." & < >"))
print(lpeg.match(xmlTextDecode, "&lt;&lt;&lt;&#x4C;&#117;&#97;&gt;&gt;&gt;"))


On 1/17/07, Roberto Ierusalimschy <roberto@inf.puc-rio.br> wrote:
Yet another version of LPeg. The main novelty is a static check for
loops, both in repetitions and in grammars, that avoids the creation of
patterns with infinite loops.

Besides that, as announced, I removed the label option in captures and
changed the order of arguments to match. The metatable was already
set, so we can write patt:match(subject).

The implementation of captures also uses less memory (half, in some
relevant cases).

I hope it will be more stable from now on.

-- Roberto