[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: Fast loading of very large "symbol" tables
- From: Canute Bigler <biglerc@...>
- Date: Sun, 12 Oct 2008 19:57:46 -0400
Norman Ramsey wrote:
> I have need to provide a mapping of "names" to integer values on a
> reasonably large scale (hundreds of thousands). The name to number
> mapping is known at the compile time of my host program (C++). My basic
> naive approach is to generate a lua source file with the mapping in a
> table and load that script file into the host environment at run-time.
> This approach has a substantial time cost to it and seems like it would
> be more or less unnecessary since the data is static, at least for that
> particular version of the program.
Thank you all for the responses so far. I had considered the route of
compiling the script to bytecode and then embedding it as a buffer that
I could then load. I'd not yet compared the timing difference between
the two methods.
I had not thought of setting the metatable __index to a C function that
would then lookup the id. That would probably be the fastest as far as
script execution time would be concerned. Any good ideas how I could
handle nested tables which are not known at run time? I guess I could
receive the field name as a period delimited string ala
"record.subrecord1.subrecord2.field" and pay the cost to parse and
lookup the field at runtime. I don't know if that would be optimal though.
A long time back I think lhf did something similar along the lines of
'self-executing Lua'. The idea is to compile to bytecodes then from
the bytecodes generate a C program that puts the bytecodes into
initialized read-only data. These days the bytecodes can actually be
larger than the original source, but you avoid the overhead of
compilation. However you still have run-time overhead.
The other alternative I see is to roll your own data structure as
static C code, define it as userdata, and define the __index method.
Obvious structures to try would be a so-called 'perfect hashing
function' (http://en.wikipedia.org/wiki/Perfect_hash_function) or the
Bentley/Sedgewick ternary search trees:
If your C programming skills are a bit rusty, the perfect hash
function is probably easier to set up as static data.
Using this method, the only dynamic overhead is creating an
initializing the metatable and associating it with the pointer to your
My largest concern across the board is that of speed and not size. For
the application for which this would be a part of, the start up time has
been a major focus. I would gladly trade size for start-up speed in
just about any way, shape, or form. Run-time speed is less of an issue.
I do appreciate everyone taking the time to brainstorm with me.