lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


It was thus said that the Great KHMan once stated:
> On 12/10/2016 9:46 AM, Duncan Cross wrote:
> >On Sat, Dec 10, 2016 at 12:10 AM, Soni L. wrote:
> >>I was told PNG is a regular language, in the sense that you can validate 
> >>any
> >>PNG chunk with a VERY VERY VERY LONG regex.
> >
> >Including the 32-bit CRC at the end of the chunk? That's got to be a
> >hell of a regex.
> 
> I'm really curious about the "I was told PNG is a regular 
> language" though. It directs the blame for the idea on an 
> anonymous someone else.
> 
> Perhaps he has VERY VERY VERY LARGE amounts of free time this 
> weekend, or perhaps he wants to trap some unsuspecting folks 
> trying it. I think both.

  LPeg isn't the right tool for this job though.  Just plain Lua is good
enough:

        function png(name)
          local f = io.open(name,"rb")
          if not f then
            return nil
          end

          local hdr = f:read(8)
          if hdr ~= "\137PNG\13\10\26\10" then
            return nil
          end

          local function next(png)
            local ch = {}
            
            ch.len = png:read(4)
            if ch.len == nil then
              return nil
            end
            
            ch.len  = string.unpack(">I4",ch.len)
            ch.type = png:read(4)
            ch.data = png:read(ch.len)
            ch.crc  = string.unpack(">I4",png:read(4))
            
            -- ----------------------------------------------------
            -- The CRC check is left as an exercise for the reader
            -- ----------------------------------------------------
            
            ch.critical = ch.type:match("^%u")    ~= nil
            ch.private  = ch.type:match("^.%l")   ~= nil
            ch.ignore   = ch.type:match("^..%l")  ~= nil
            ch.safecopy = ch.type:match("^...%l") ~= nil
            
            return ch
          end

          return next,f
        end

        for chunk in png("somepngfile.png") do
          -- process each chunk
        end

But yes, it *can* be done in LPeg (probably took me an hour or so):

        local lpeg = require "lpeg"
        local Cmt  = lpeg.Cmt
        local Ct   = lpeg.Ct
        local Cg   = lpeg.Cg
        local Cb   = lpeg.Cb
        local Cc   = lpeg.Cc
        local C    = lpeg.C
        local R    = lpeg.R
        local P    = lpeg.P

	-- ---------------------------------------------------------------
	-- Convert 4-byte network ordered values.  If using Lua 5.1 or Lua
	-- 5.2, change the call to string.unpack() as needed.
	-- ---------------------------------------------------------------

        local value = P(4)
                     / function(len)
                         return string.unpack(">I4",len)
                       end

	-- ----------------------------------------------------------------
	-- PNGs are comprised of chunks of typed data.  The name of a chunk
	-- is four ASCII characters in the range of A-Z or a-z.  The case of
	-- each letter also indicates some other features of that chunk (see
	-- https://en.wikipedia.org/wiki/Portable_Network_Graphics for
	-- details).  First, we check the case of each letter in the ID for
	-- the flags.
	-- ----------------------------------------------------------------
	
        local type1 = R"AZ" * Cg(Cc(true),'critical')
                    + R"az" * Cg(Cc(false),'critical')

        local type2 = R"AZ" * Cg(Cc(false),'private')
                    + R"az" * Cg(Cc(true),'private')

        local type3 = R"AZ" * Cg(Cc(false),'ignore')
                    + R"az" * Cg(Cc(true),'ignore')

        local type4 = R"AZ" * Cg(Cc(false),'safecopy')
                    + R"az" * Cg(Cc(true),'safecopy')
                    
	-- ----------------------------------------------------------------
	-- To check the type, we first scan the four bytes and set the
	-- flags, then, using a match-time capture, we grab the characters
	-- we just scanned and return them as well.
	-- ----------------------------------------------------------------

        local type  = type1 * type2 * type3 * type4
                    * Cg(Cmt(Cc(true),function(subject,pos,_)
                        return pos,subject:sub(pos - 4,pos - 1)
                      end),'type')

	-- -----------------------------------------------------------------
	-- To obtain the data, we need the length.  Fortunately, by the time
	-- this is called, we've already captured the length in a named
	-- capture.  We use lpeg.Cb() to retrieve the previously calculated
	-- length, and by using a match-time capture, snarf the appropriate
	-- amount of data and return it.
	-- -----------------------------------------------------------------

        local data  = Cmt(Cb('length'),function(subject,pos,capture)
                        return pos + capture,subject:sub(pos,pos + capture - 1)
                      end)

        local chunk = Cg(value,'length')
                    * type
                    * Cg(data,'data')
                    * Cg(value,"crc")
                    
	-- ----------------------------------------------------------------
	-- The CRC check is left as an exercise for the reader (and yes, it
	-- can be done via LPeg patterns and a function---hint: lpeg.Cmt()
	-- with lpeg.Cb())
	-- ----------------------------------------------------------------        

        local header = P"\137PNG\13\10\26\10"
        local png    = header * Ct((Ct(chunk))^1)

	local f = io.open("somepngfile.png","rb")
	local chunks = png:match(f:read("*a"))
	f:close()
	
  The major problem I see for using LPeg is that you need to load the entire
file first before processing.  That shouldn't be an issue these days, but
it's hard to do stream processing with LPeg.  I'm not saying it can't be
done, but LPeg really doesn't buy you much over straight Lua.

  I think the major reason for the difficulty in using a regex is that the
amount of data is dependent not upon a pattern but a set value that is part
of the data itself (which is why a used a match-time capture of the data).

  -spc (Enhancements can include checking for required and known chunks, but
	again, I'm leaving that as an exercise for the reader; I've spent
	enough time on this already)