lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Mon, Feb 23, 2009 at 2:16 PM, John Hind <john.hind@zen.co.uk> wrote:
> I'm with Luiz on this one and I think your objections are misguided:

Please note that I'm not proposing luabins as a *replacement* to
anything -- or even as a "default" serialization library in any sense.

Also, I'm all for bytecode serialization. It is just can't be used
*everywhere*, as it has its issues (as any solution would have its
own).

In my opinion, main issue with bytecode serialization in our case is
that one can't use whitelist approach with it -- allow only whatever
data kinds are needed, disallow everything else.

Also please keep in mind that input to any load function is not
necessary created by the corresponding save function. It may be
created by malicious hand (or just damaged in transit, or be a random
garbage).

Now to answer to your points in detail:

> 1. Lua will already read "bytecodes" through the same channel as source
> files, so it had better be robust! If there are known problems with this,
> hopefully they will be fixed before too long.

Unfortunately, Lua bytecode handling are much less robust than text
sources -- in the handling "bytecode not generated by Lua compiler"
part. There were a few bugs recently. Please see 5.1.3 bugs 5, 6, 7,
8, 5.1.2 bug 7:

http://www.lua.org/bugs.html

Also note this post:

http://article.gmane.org/gmane.comp.lang.lua.general/46493/

AFAIR, one such known bug is still there in 5.1.4. Note that current
bugs page does not mentions anything. It merely promises us to be
"updated soon" for a two months plus already. :-(

Granted, these bugs are usually fixed rather quickly (thank you, Lua
people!). But I do not see how it may be proved that Lua bytecode
handling does not contain any more crash issues. It is much easier to
prove such thing for a specially crafted binary data loader.

Also keep in mind that runtime validation to catch such bugs does cost
extra performance. IMO, Lua bytecode is too generic to store data
effectively. Granted, it is convenient to use and does not require
extra libraries to be loaded -- (for me) that is the main reason it
*should* exist. But sometimes you'd want more than this.

> 2. You can introduce an infinite loop through source code as easily as
> through "bytecodes".

Indeed. But not through (correctly handled) binary serialized data.
Loader for such data does not have to be a VM for arbitrary code
execution.

> 3. The serialisation process would generate the "bytecodes" algorithmically
> and so the round trip should be just as robust as using an "arbitrary"
> binary format - the process should not write any "bytecode" sequences that
> are invalid or dangerous. Luiz's code looks dangerous because it exposes
> internals to Lua and does the "heavy lifting" there. However this was just a
> proof-of-concept prototype - in final code the serialisation would be done
> internally in "C".

I'm not scared by Luiz's (or yours) bytecode serialization code. I'm
"scared" by feeding arbitrary user-supplied bytecode to loadstring().

> 4. "Bytecodes" are the only practical way of serialising functions (unless
> you decompile them to source code strings). Most complete serialisation
> schemes embed "bytecodes" within the "arbitrary" binary format and so
> inherit any problems with them.

Indeed. However, serialization of functions is outside of luabins's scope.

> One real issue however is that the "bytecode" format is not particularly
> compact and I guess a "arbitrary" format could be made a good deal more
> efficient in storage requirement. It is also worth pointing out that to do
> bytecode serialisation at present you need to modify Lua internals (or
> duplicate a lot of code).

I agree.

> I am not disparaging your good work, which I am sure has its niche, but if
> serialisation is to be added to "Lua the language" as opposed to "Lua the
> ecosystem", I think serious consideration should be given to using the
> "bytecode" format, possibly improved with some compression, to get maximum
> leverage of the existing code base.

I agree that Lua needs built-in serialization. However I do not see
how we would get such thing in a next several years. Meanwhile I'm
trying to do what I can to deal with problems I have at hand. :-)

> See my paper here:

> http://lua-users.org/wiki/EngramProposal

> Which builds on Luiz's work (although I have not written the code for table
> serialisation yet).

I've seen it and I want to congratulate you for a good work. I'm sure
what you do is very good for Lua itself. Thank you!

I even intend to use engrams someday (but they must have table
serialization then.) :-)

However, as I tried to explain above, for my task at hand, bytecode
serialization is currently "outside of niche".

> ** NB I keep parenthesising "bytecode" because I find it a misleading term,
> especially as Lua uses 32-Bit "bytecodes". "VM Codes" would be a better term
> to use.

Perhaps. I hope other readers would bear with us for not using it so far.

Alexander.