[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Parsing strings written by string.format("%q")
- From: Sean Conner <sean@...>
- Date: Wed, 23 Nov 2016 17:36:27 -0500
It was thus said that the Great David Given once stated:
> I want to write out strings to a file and read them back in again, in an
> ASCII-safe way. I'm writing them out with string.format("%q"), because
> it's cheap and easy[*], but now I need to read them back in again.
>
> Actually going through the string and parsing the escapes seems like a
> lot of work. A faster but much scarier alternative is to wrap the string
> with 'return (<string>' and compile and execute it in a restricted
> environment.
>
> Given that the interpreter already contains code to safely turn an
> escaped string into a Lua string, there must be a better way --- what is it?
I turn to LPeg for any parsing job. Here's an example that does what you
want:
local lpeg = require "lpeg"
local digit = lpeg.R"09"
local escape = lpeg.P[[\]] / "" -- handle escape codes
* (
(digit * digit^-2) -- handle \nnn, return actual byte
/ function(c)
return string.char(tonumber(c))
end
+ lpeg.P"\n" / "\n" -- handle 'continue to next line' escape
+ lpeg.P'"' / '"' -- double quote
+ lpeg.P"'" / "'" -- single quote
+ lpeg.P"a" / "\a" -- other escapes
+ lpeg.P"b" / "\b"
+ lpeg.P"t" / "\t"
+ lpeg.P"n" / "\n"
+ lpeg.P"v" / "\v"
+ lpeg.P"f" / "\f"
+ lpeg.P"r" / "\r"
)
local line = lpeg.P'"' -- must start with a quote
* lpeg.Cs( ( escape + (lpeg.P(1) - lpeg.S[["\]]))^0 )
* lpeg.P'"' -- must end with a quote
test = [["o\"ne\1two\3th\"ree\
four\9five\127hello"]]
x = line:match(test)
print(test) print()
print(x) print()
y = string.format("%q",x)
print(y) print()
The real work happens in the lpeg.Cs() call, which iterates through the
string, replacing escape sequences with their literal replacement. I expect
this to be, speed wise, similar to using the Lua parser since it compiles
into its own VM.
-spc