lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

On Mon, Jan 23, 2012 at 9:35 AM, Tony Finch <> wrote:
> Jay Carlson <> wrote:
>> This README and source at . The talk
>> about shell stuff from a while back had me panicking about compact
>> syntax choices for quasi-quoted lists (never mind the jargon, that's
>> what a shell language *is*). A slow panic I guess.
> Interesting, thanks for posting. Reminds me a bit of
> though in that article I was
> thinking about embedding other languages rather than quasiquotation.

I'm not sure there's much difference if we are not allowed a macro system.

> I am in two minds about interpolation.

I'm not. Any time two strings are concatenated is likely to have at
least one bug. Which didn't used to matter so much, except some bugs
can be turned into execution of arbitrary code.

> It is very convenient, but it
> leads to injection bugs in stringly-typed systems (where all data are
> undifferentiated strings)

I had not heard that term before (and it's a year and a half old).
It's a great meme for getting people to rethink how they code. Thanks.

Original coinage apparently at

There's a lucid description at .

> where there isn't enough type checking to spot
> when you failed to correctly escape a string before interpolating.

You've already lost when the problem is stated in terms of escaping,
at least as a task other than specialist library code does. Types are
the problem. Even the duck typing meme does not work on strings. It's
just as bad as BCPL.

Let's back up a minute. There's an actual theory 101 issue here,
almost one you'd see in a problem set.

You have a formal language L. You extend it to Lprime by adding $1,
$2, etc symbols; template t is in Lprime. You have other formal
languages L1, L2, ... which strings s1, s2, ... are members of. Can
you prove that transforming t into by replacing $1,$2,... with
s1,s2,... results in a string r which is in language L? Under what
conditions? (Some L,L1,L2... are much easier than others.) This is the
check for syntactic correctness.

Consider the case when L is in the class of context-free grammars.
Eyeballing it, I'd say the question looks undecidable if L1,L2 are
CFGs as well (but my intuition is a little rusty.) But I'm too lazy to
read up on this, because there's a more pressing subproblem:

L is a context-free grammar and one with a useful parse tree
(handwave). Can L1,L2... be designed or proved to be such that the
substitution operation on all s1,s2,etc results in a string s(t) in L
with a parse tree with "similar" shape? For example the shape of
"print($1)" looks rather different when $1 is "7" or  "7) os.exit(0".

Escaping values makes reasoning about the languages tractable because
(ideally) L1,L2,... are very simple languages, often in the class of
regular languages. Running string.format("%q", s1) is intended to give
you a string in a much smaller language class, in particular one which
will parse only as the String terminal in the Lua language syntax
(which means if $1 is in an "exp" nonterminal, the substitution can't
change the shape of the tree.)

So, we'd have a tractable escaping problem if the formal langauge Lua
was the only language in the universe, or if its Strings were
identical to terminals in other languages.

And if everybody stopped using ".." and instead always used
format("%q"). Which they won't, because .. is often a lot less typing
and thinking. (For starters, format("%q %q is %q", a, c, b) means your
eyes flit back and forth between the argument list and the unnamed
format designators. Oh and you have to count them too.)

My desire is a meta-mechanism that makes ".." more annoying than doing
the correct thing.

> I wonder if Terrence Parr's "Strict Model-View Separation in Template
> Engines" provides enough structure to make these bugs easier to handle.

I think Parr is solving some interesting problem but it's not this
one. More like E4X (except the typing disaster), Comega. And yeah, the
people who already mostly solved it are using Haskell, and therefore
are too weird to pay attention to. Quasiquotes, typed strings, etc. (and cmon, you've
*got* to love a Web framework named Yesod.)

> Along those lines I quite like the idea of compiling a $ string to a
> function that takes the values to interpolate, e.g.
>        $"$1: $2 lines"(total, count)

Don't need a token filter for that because there are no lexicals.

-- Build source code; executing this is just as efficient as table.concat:
> = positional_parser[[The $2 is $1...$1.]]
return function(a1,a2,a3,a4,a5,a6,a7,a8,a9)
    return "The "..a2.." is "..a1.."..."..a1..".".."" end

-- loadstring() and hopefully memoize it
> = pfmt[[The $2 is $1...$1.]]("still", "sun")
The sun is still...still.

pfmt() returns a function, so there are lots of other games to play
with it. I guess your point is that you would like short syntax in
core to do something about this so you can always count on pfmt being

> But then it's questionable if this is much better than a Python-style
> overloading of the % operator to string.format...

None of it really helps if strings don't get more type-y. Python has
this problem too.