lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Although at times it is "convenient" to be able to load malicious
bytecode, I think that for the language, it is better to never be able
to load malicious bytecode. I've shown that 5.1's verifier is not good
enough, and as a result, there are two obvious options for 5.2:
1) Get rid of the verifier entirely, education users of the language
as to what bytecode is and when it can be bad, and make the users
responsible for rejecting bytecode.
2) Improve and/or rewrite the verifier.
In terms of ease of implementation, the first option wins outright. As
for which is safer, I do not see that as so clear cut: Option one
allows malicious bytecode to be loaded if the users of the language
forget to reject bytecode or base their acceptance of bytecode on some
flawed system of trust, whereas option two allows malicious bytecode
to be loaded if there are any bugs in the verifier. Option two has the
significant advantage that (the existence of) bytecode can go back to
being an implementation detail rather than leaking out into the
standard library. For this reason, I'd like to see Lua 5.2 use option
2 rather than option 1.

I've previously written the LBCV library [1], which acts as an
external verifier for 5.2. This library addresses all of the
shortcomings of 5.1's verifier that I'm aware of, but has some
shortcomings of its own:
1) Being external, it is slightly clunky to use, as you have to
remember to use it in place of any potentially unsafe calls to
anything which uses lua_load.
2) Being external, it has the bytecode format dictated to it by the
Lua source, and has to perform its own decoding of the bytecode
alongside lua_load's decoding of it.
3) It is slow and memory hungry.

I'm currently investigating the idea of making some changes to
5.2-alpha in order to make bytecode verification easier, and then
adding a verifier to 5.2-alpha. I've got to a stage where the verifier
is weak enough to accept everything out of the code generator (or at
least it does for every input to the code generator that I've tested),
and I feel that verifier is strong enough to reject whatever I can
throw at it, but I haven't yet spent much time trying to break it. I
won't be confident about it until I've spent a chunk of time trying to
break it, but my feeling is that it is a potential candidate for going
with option 2 for 5.2.

The changes I've made to 5.2-alpha are the following:
* Change the bytecode format to make startpc and endpc values for
every local compulsory information rather than optional debug
information (the name of the local remains as optional debug
information).
* Reintroduce some checks which were present in 5.1: the
non-negativity check in lundump.c's LoadInt, and the table type check
in lvm.c's OP_SETLIST (as efficiently verifying the latter seems
non-trivial).
* Two minor tweaks to the code generator: when an OP_CLOSE is
generated upon leaving a block, shift the endpc values of expiring
locals to include the OP_CLOSE, and when the "local function" sugar is
being compiled, shift the startpc value of the new local to not
include the OP_CLOSURE (which makes it consistent with the non-sugared
behaviour).
* Some changes to the table in lopcodes.c: OP_TEST doesn't assign to
RA, nor does it use B, and declare OP_LOADNIL as just using B rather
than using it as a register, as its usage of RB is different to how
every other instruction uses RB.

These changes then permit the introduction of my new relatively
efficient bytecode verifier:
In terms of code size, this is about a KLOC of new code, though a
chunk of that is comment.
In terms of memory usage, for tracking state during verification of a
prototype, 2 bytes are needed for each instruction, and 3 words are
needed for each local variable (compare this against LBCV, which needs
3 words per instruction plus 1 byte per stack slot per instruction).
In terms of speed, loading plus verifying bytecode is slower than in
5.1, but still much faster than parsing source code. For the case of
loading every .lua file in LuaDist, I'm currently looking at the
following numbers:
Loading bytecode with 5.2 alpha: 52.5ms
Loading bytecode with 5.1.4: 61.0ms
Loading bytecode with 5.2 alpha + proposed verifier: 80.5ms
Loading bytecode with 5.2 alpha + LBCV: 122.7ms
Parsing source code in 5.2 alpha: 290.5ms
For the smaller case of loading every .lua file in Prosody, the numbers are:
Loading bytecode with 5.2 alpha: 4.4ms
Loading bytecode with 5.1.4: 5.1ms
Loading bytecode with 5.2 alpha + proposed verifier: 7.4ms
Loading bytecode with 5.2 alpha + LBCV: 11.3ms
Parsing source code in 5.2 alpha: 28.3ms

I'm now looking for feedback on the proposal: Do you think it is a
good proposal? Do the Lua authors think that inclusion of such a
verifier is possible? Should I go ahead with spending time trying to
break it? If the general consensus is positive, then I shall go ahead,
and subsequently post a patch once I'm happy with it.

Regards,
Pete

[1] http://code.google.com/p/lbcv/