[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Can LPeg parse PNG?
- From: Sean Conner <sean@...>
- Date: Fri, 9 Dec 2016 22:36:56 -0500
It was thus said that the Great KHMan once stated:
> On 12/10/2016 9:46 AM, Duncan Cross wrote:
> >On Sat, Dec 10, 2016 at 12:10 AM, Soni L. wrote:
> >>I was told PNG is a regular language, in the sense that you can validate
> >>any
> >>PNG chunk with a VERY VERY VERY LONG regex.
> >
> >Including the 32-bit CRC at the end of the chunk? That's got to be a
> >hell of a regex.
>
> I'm really curious about the "I was told PNG is a regular
> language" though. It directs the blame for the idea on an
> anonymous someone else.
>
> Perhaps he has VERY VERY VERY LARGE amounts of free time this
> weekend, or perhaps he wants to trap some unsuspecting folks
> trying it. I think both.
LPeg isn't the right tool for this job though. Just plain Lua is good
enough:
function png(name)
local f = io.open(name,"rb")
if not f then
return nil
end
local hdr = f:read(8)
if hdr ~= "\137PNG\13\10\26\10" then
return nil
end
local function next(png)
local ch = {}
ch.len = png:read(4)
if ch.len == nil then
return nil
end
ch.len = string.unpack(">I4",ch.len)
ch.type = png:read(4)
ch.data = png:read(ch.len)
ch.crc = string.unpack(">I4",png:read(4))
-- ----------------------------------------------------
-- The CRC check is left as an exercise for the reader
-- ----------------------------------------------------
ch.critical = ch.type:match("^%u") ~= nil
ch.private = ch.type:match("^.%l") ~= nil
ch.ignore = ch.type:match("^..%l") ~= nil
ch.safecopy = ch.type:match("^...%l") ~= nil
return ch
end
return next,f
end
for chunk in png("somepngfile.png") do
-- process each chunk
end
But yes, it *can* be done in LPeg (probably took me an hour or so):
local lpeg = require "lpeg"
local Cmt = lpeg.Cmt
local Ct = lpeg.Ct
local Cg = lpeg.Cg
local Cb = lpeg.Cb
local Cc = lpeg.Cc
local C = lpeg.C
local R = lpeg.R
local P = lpeg.P
-- ---------------------------------------------------------------
-- Convert 4-byte network ordered values. If using Lua 5.1 or Lua
-- 5.2, change the call to string.unpack() as needed.
-- ---------------------------------------------------------------
local value = P(4)
/ function(len)
return string.unpack(">I4",len)
end
-- ----------------------------------------------------------------
-- PNGs are comprised of chunks of typed data. The name of a chunk
-- is four ASCII characters in the range of A-Z or a-z. The case of
-- each letter also indicates some other features of that chunk (see
-- https://en.wikipedia.org/wiki/Portable_Network_Graphics for
-- details). First, we check the case of each letter in the ID for
-- the flags.
-- ----------------------------------------------------------------
local type1 = R"AZ" * Cg(Cc(true),'critical')
+ R"az" * Cg(Cc(false),'critical')
local type2 = R"AZ" * Cg(Cc(false),'private')
+ R"az" * Cg(Cc(true),'private')
local type3 = R"AZ" * Cg(Cc(false),'ignore')
+ R"az" * Cg(Cc(true),'ignore')
local type4 = R"AZ" * Cg(Cc(false),'safecopy')
+ R"az" * Cg(Cc(true),'safecopy')
-- ----------------------------------------------------------------
-- To check the type, we first scan the four bytes and set the
-- flags, then, using a match-time capture, we grab the characters
-- we just scanned and return them as well.
-- ----------------------------------------------------------------
local type = type1 * type2 * type3 * type4
* Cg(Cmt(Cc(true),function(subject,pos,_)
return pos,subject:sub(pos - 4,pos - 1)
end),'type')
-- -----------------------------------------------------------------
-- To obtain the data, we need the length. Fortunately, by the time
-- this is called, we've already captured the length in a named
-- capture. We use lpeg.Cb() to retrieve the previously calculated
-- length, and by using a match-time capture, snarf the appropriate
-- amount of data and return it.
-- -----------------------------------------------------------------
local data = Cmt(Cb('length'),function(subject,pos,capture)
return pos + capture,subject:sub(pos,pos + capture - 1)
end)
local chunk = Cg(value,'length')
* type
* Cg(data,'data')
* Cg(value,"crc")
-- ----------------------------------------------------------------
-- The CRC check is left as an exercise for the reader (and yes, it
-- can be done via LPeg patterns and a function---hint: lpeg.Cmt()
-- with lpeg.Cb())
-- ----------------------------------------------------------------
local header = P"\137PNG\13\10\26\10"
local png = header * Ct((Ct(chunk))^1)
local f = io.open("somepngfile.png","rb")
local chunks = png:match(f:read("*a"))
f:close()
The major problem I see for using LPeg is that you need to load the entire
file first before processing. That shouldn't be an issue these days, but
it's hard to do stream processing with LPeg. I'm not saying it can't be
done, but LPeg really doesn't buy you much over straight Lua.
I think the major reason for the difficulty in using a regex is that the
amount of data is dependent not upon a pattern but a set value that is part
of the data itself (which is why a used a match-time capture of the data).
-spc (Enhancements can include checking for required and known chunks, but
again, I'm leaving that as an exercise for the reader; I've spent
enough time on this already)