Re: Fast loading of very large "symbol" tables

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Fast loading of very large "symbol" tables
From: Canute Bigler <biglerc@...>
Date: Sun, 12 Oct 2008 19:57:46 -0400

Norman Ramsey wrote:

> I have need to provide a mapping of "names" to integer values on a> reasonably large scale (hundreds of thousands). The name to number> mapping is known at the compile time of my host program (C++). My basic> naive approach is to generate a lua source file with the mapping in a> table and load that script file into the host environment at run-time.> This approach has a substantial time cost to it and seems like it would> be more or less unnecessary since the data is static, at least for that> particular version of the program.
A long time back I think lhf did something similar along the lines of
'self-executing Lua'.  The idea is to compile to bytecodes then from
the bytecodes generate a C program that puts the bytecodes into
initialized read-only data.   These days the bytecodes can actually be
larger than the original source, but you avoid the overhead of
compilation.  However you still have run-time overhead.

The other alternative I see is to roll your own data structure as
static C code, define it as userdata, and define the __index method.
Obvious structures to try would be a so-called 'perfect hashing
function' (http://en.wikipedia.org/wiki/Perfect_hash_function) or the
Bentley/Sedgewick ternary search trees:

   http://www.cs.princeton.edu/~rs/strings/

If your C programming skills are a bit rusty, the perfect hash
function is probably easier to set up as static data.

Using this method, the only dynamic overhead is creating an
initializing the metatable and associating it with the pointer to your
static structure.


Norman

Thank you all for the responses so far. I had considered the route ofcompiling the script to bytecode and then embedding it as a buffer thatI could then load. I'd not yet compared the timing difference betweenthe two methods.I had not thought of setting the metatable __index to a C function thatwould then lookup the id. That would probably be the fastest as far asscript execution time would be concerned. Any good ideas how I couldhandle nested tables which are not known at run time? I guess I couldreceive the field name as a period delimited string ala"record.subrecord1.subrecord2.field" and pay the cost to parse andlookup the field at runtime. I don't know if that would be optimal though.

My largest concern across the board is that of speed and not size. Forthe application for which this would be a part of, the start up time hasbeen a major focus. I would gladly trade size for start-up speed injust about any way, shape, or form. Run-time speed is less of an issue.


I do appreciate everyone taking the time to brainstorm with me.

Follow-Ups:
- Re: Fast loading of very large "symbol" tables, David Given
- Re: Fast loading of very large "symbol" tables, Javier Guerra Giraldez
- Re: Fast loading of very large "symbol" tables, KHMan
- Re: Fast loading of very large "symbol" tables, Klaus Ripke

References:
- Fast loading of very large "symbol" tables, Canute Bigler
- Re: Fast loading of very large "symbol" tables, Norman Ramsey

Prev by Date: Re: Fast loading of very large "symbol" tables
Next by Date: Re: Fast loading of very large "symbol" tables
Previous by thread: Re: Fast loading of very large "symbol" tables
Next by thread: Re: Fast loading of very large "symbol" tables
Index(es):
- Date
- Thread