lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


It was thus said that the Great Egor Skriptunoff once stated:
> On Thu, Oct 12, 2017 at 10:16 PM, Martin wrote:
> 
> > On 10/12/2017 07:56 PM, Egor Skriptunoff wrote:
> > > function end_of_string_literal (text, start_pos, quote)
> > >    return text:gsub("\\?.", {[quote]="\0"}):match("%z()", start_pos)
> > > end
> > >
> > > print(end_of_string_literal ([[a="\"A",b="\\\"B\"",c="C"]], 4,  '"'))
> > --> 8
> > > print(end_of_string_literal ([[a="\"A",b="\\\"B\"",c="C"]], 12, '"'))
> > --> 20
> > > print(end_of_string_literal ([[a="\"A",b="\\\"B\"",c="C"]], 24, '"'))
> > --> 26
> >
> > What task this code solves?
> >
> >
> The task is to simplify Sony's code:
> 
> local m, pos
> repeat
>   m, pos = string.match(word_eol[2], "(\\*)"..cw:sub(1,1).."()", pos or 2)
> until m == nil or #m % 2 == 0
> 
> (cw:sub(1,1) being one of " or ')
> 
> What this code does?
> It is probably a part of some parser (or should I say *scanner*?)
> A text (in variable word_eol[2]) starts with quote-delimited string literal
> (the quote is cw:sub(1,1))
> This code finds the position where the string literal terminates.
> String literal syntax implied here is allowing backslash escaping.

  First off, %z was deprecated in Lua 5.2 (see section 8.2), and it's not
mentioned at all in the Lua 5.3 manual (although my version of Lua 5.3
does run the above code).  Here is some code that works (and maybe even Soni
would like it, as it's not limited to '"' as the quote character---it can be
any string, and said string can appear escaped!):

local lpeg = require "lpeg"
local Carg = lpeg.Carg
local Cmt  = lpeg.Cmt
local P    = lpeg.P
local S    = lpeg.S
local V    = lpeg.V

-- **********************************************************************
-- Compare the next bit if input with our quote character.  We can't use
-- string.find(), as that scans ahead in the string.  I don't use
-- string.match() because otherwise, I would have to scan the quote string
-- and escape any special characters.
-- **********************************************************************

local function mq(subject,position,quote)
  if quote == subject:sub(position,position + #quote - 1) then
    return position + #quote
  end
end

-- **********************************************************************
-- Our LPeg grammar.  It expects a <quotechar> (passed in to the grammar),
-- and a sequence of characters.  The <quotechar> (which can be any length)
-- can be esscaped by '\'.  Try that using normal Lua patterns!
-- **********************************************************************   

local qs = P {
  "string",
  char = P[[\]] * Cmt(Carg(1),mq) -- match \<quotechar>
       * V"qs"                    -- and qs (forward reference)
       * P[[\]] * Cmt(Carg(1),mq) -- and end with \<quotechar>
       + P[[\]] * P(1)            -- or escape char 
       + (P(1) - (P[[\]] + Cmt(Carg(1),mq))), -- or character other than \ or <quotechar>
       
  qs = V"char"^0, -- any number of chars (see above)
  
  string = Cmt(Carg(1),mq)  -- match our quote character
         * V"qs"            -- plus a qs 
         * Cmt(Carg(1),mq), -- and finally our quote character
  
}

function eosl(text,pos,quote)
  return qs:match(text,pos or 1,quote or '"')
end

print(eosl [["This" should return 7]])
print(eosl([[<q>This<q> should return 11]],1,"<q>"))
print(eosl([[<q>a\<q>b\<q>c<q>d]],1,"<q>"))
print(eosl [["This \"string\" here" should return 23]])
print(eosl [["This \"really \\\"embedded\\\" string\" here" should return 47]])
print(eosl([[<q>This \<q>embedded\<q> string<q> returns 35]],1,"<q>"))
print(eosl([[This here "string" should return 19]],11))
print(eosl [["This""is" 7]])

  -spc (But I again failed to read Soni's mind so this is probably incorrect
	somehow ...)