lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Norman Ramsey wrote:
> I have need to provide a mapping of "names" to integer values on a > reasonably large scale (hundreds of thousands). The name to number > mapping is known at the compile time of my host program (C++). My basic > naive approach is to generate a lua source file with the mapping in a > table and load that script file into the host environment at run-time. > This approach has a substantial time cost to it and seems like it would > be more or less unnecessary since the data is static, at least for that > particular version of the program.

A long time back I think lhf did something similar along the lines of
'self-executing Lua'.  The idea is to compile to bytecodes then from
the bytecodes generate a C program that puts the bytecodes into
initialized read-only data.   These days the bytecodes can actually be
larger than the original source, but you avoid the overhead of
compilation.  However you still have run-time overhead.

The other alternative I see is to roll your own data structure as
static C code, define it as userdata, and define the __index method.
Obvious structures to try would be a so-called 'perfect hashing
function' (http://en.wikipedia.org/wiki/Perfect_hash_function) or the
Bentley/Sedgewick ternary search trees:

   http://www.cs.princeton.edu/~rs/strings/

If your C programming skills are a bit rusty, the perfect hash
function is probably easier to set up as static data.

Using this method, the only dynamic overhead is creating an
initializing the metatable and associating it with the pointer to your
static structure.


Norman

Thank you all for the responses so far. I had considered the route of compiling the script to bytecode and then embedding it as a buffer that I could then load. I'd not yet compared the timing difference between the two methods. I had not thought of setting the metatable __index to a C function that would then lookup the id. That would probably be the fastest as far as script execution time would be concerned. Any good ideas how I could handle nested tables which are not known at run time? I guess I could receive the field name as a period delimited string ala "record.subrecord1.subrecord2.field" and pay the cost to parse and lookup the field at runtime. I don't know if that would be optimal though.

My largest concern across the board is that of speed and not size. For the application for which this would be a part of, the start up time has been a major focus. I would gladly trade size for start-up speed in just about any way, shape, or form. Run-time speed is less of an issue.

I do appreciate everyone taking the time to brainstorm with me.