lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Reuben Thomas wrote:

Almost all of the code is written purely in terms of the standard library
(one or two routines use standard Unix command-line utilities via
os.execute).

os.execute and io.popen are scary functions. When using them, you have to be very certain about the contents of the strings being passed in. I'm going to pick on your code, since you were brave enough to post it, but this is an underlying problem many of us have run into. Me, I ran into it the hard way. It would be nice if we could solve it just once.

For instance, in std.io.env.lua you have:

  shell ("ls -aU " .. d .. " 2>/dev/null")))

If d has a space in it, this will not do the right thing. (Of course, d=".;rm -rf /" is worse.)

This can be fixed up a little:

  shell ("ls -aU '" .. d .. "' 2>/dev/null")

but now we can't use directories with single quotes in them. For correctness we can then reject directories with single quotes. Well, that's bad, but not as bad as letting the shell interpret parts of d as shell special characters.

That's as far as I got in my code. You can do fancier quoting to allow single-quotes as well.

It is rare that you want to use shell special characters outside certain stereotyped ways, and common that you already have arguments divided up into separate elements. So in my code I had:

  execute{"mkdir", d}

which would quote all its arguments and then call os.execute. The way I'd write that shell command is

  shell{"ls", "-aU", d, stderr=false}

I don't think I dealt with pipelines in this code, but it seems like something like this would make sense:

  execute("find", d, "-type", "f", "-print", PIPE, "xargs", "wc"}

where PIPE is a unique table. In fact, we can use tables to flag sequences of arguments we know are safe:

  execute("find", d, {"-type f -xdev ! -empty"}, PIPE, "xargs", "wc"}

or with some other kind of sugar like argl"-type f etc etc etc". What this marks is that "-type f" is of a different type than normal strings---the author is really writing out a *list* of strings in a more convenient surface syntax.

It would also be nice to set the environment. With a tool like /usr/bin/env, that's possible too; possible syntax:

  shell{env={TZ="GMT"}, "date"}

It's possible to write this all portably on POSIX systems, since we know what the shell is like. Well, we think we do---ISTR some old security problems from underescaping in bash. In some sense the right thing is to do this in C, using fork and execve. That way we don't have to depend on shell unquoting of things, and we have complete control over the environment. In C, we can also support doing similar things on Windows using a different implementation (albeit a subset).

We can steal design from other systems. For example, here's a recent Python approach to this:

http://www.python.org/peps/pep-0324.html

with a discussion at

http://mail.python.org/pipermail/python-dev/2004-January/041606.html

I'll post the code I have once I find it.

Now, the usual rant: strings are the root of all evil, because they are untyped. When building aggregate data structures out of them through concatenation, your language can't help you get the well-formedness and semantics right. This makes them dangerous. Because they are used as the (often ill-specified) communication channel with other applications there can be even more bugs lurking in their use. When possible, strings should be moved to the edges of your program in such a way that you don't have to worry about quoting or typing issues in your normal application code. However, strings are quite convenient, since most modern languages and their libraries have good support for them. So in the quest for structural purity, don't make it annoying to do the right thing or else people will go back to exec("ls "..d) :-)

On that note, specialized interpolation could do some good. Consider the imaginary

  local d="My Library"
  my_execute($"ls $d")

where $"ls $d" is a constructor for {"ls $d", ["d"]=d}. my_execute could split that on spaces into {"ls", "$d", d="My Library"}, then perform substitution of variables to {"ls", "My Library"}. I think this has to be syntax, because you can't get a capture of the d local, even using the debug library, if d is in an outer scope.

The current Lua solution would be

  my_execute(interpolate{"ls $d", d=d})

which isn't so bad except that it's annoying and feels like the old lame Python lambda upvalues workaround.

I don't know whether I think adding more syntax is a good idea or not.

Jay