lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Some time ago, I had a similar need and have used pseudo-recursive C macros to compute the hash value at compile time: https://gist.github.com/1048415. In the case of Lua 5.1, I think the hash function for small strings is (lstring.c#80):

for (l1=l-1; l1>=0; l1-=1)
  h = h ^ ((h<<5)+(h>>2)+cast(unsigned char, str[l1]));

Which means... it won't work for two reasons:

1. I am not sure how to unroll this loop without knowing the string length (relying on '\0')
2. The hash function uses the previous value "h" 3 times which means that for a maximum of 8 characters long strings, the macro expansion becomes really big (3**8 = 6561) which is *bad* if not impossible to compile.

Another alternative is the old parse and compile. You use HASH("foobar") in your code and have a regexp + Lua preprocessor replace this by during the compilation process (with some proper makefiles, this is not as painful as it looks):

83243 /* HASH("foobar") */

Finally, some assertions to validate the hash on code init is recommended...

While we are on the subject, does anyone know why this particular hashing function has been chosen ? I do not seem to recognize it (http://home.comcast.net/~bretm/hash/) and since it's such a crucial part of Lua, I bet there is some explanation somewhere...

Any pointers ?

Gaspard

On Sun, Jun 26, 2011 at 11:29 PM, Andre Leiradella <andre@leiradella.com> wrote:
On 26/06/2011 14:05, Rebel Neurofog wrote:
But that takes a table lookup for the key...
I'm not sure, but it is probably faster than sequential string
comparison (especially for lots of strings).
But of course it is slower than hashing.
It's kind of simple intermediate workaround.

There's also another trick (which works fast for single Lua state per
executable):

static const char *points_string;

static void func ()
{
    lua_State *L = lua_newstate (...);

    lua_pushliteral (L, "points"); /* There is a simpler way in Lua 5.2 */
    points_string = lua_tostring (L, -1);
    lua_setfield (L, LUA_REGISTRYINDEX, "points_string"); /* to
prevent garbage collection */
}

static void some_lua_func (lua_State *L)
{
    const char *st = lua_tostring (L, 1);
    if (points_string == st) { /* That's the magic */
        /* We've got a match here */
    }
    return 0;
}

If you have several Lua states, then you may do something like that:

typedef struct some_ud_t {
    const char *points_string;
} some_ud_t;

static void func ()
{
    some_ud_t *ud = malloc (sizeof (some_ud_t));
    lua_State *L = lua_newstate (allocator, ud);

    lua_pushliteral (L, "points"); /* There is a simpler way in Lua 5.2 */
    ud->points_string = lua_tostring (L, -1);
    lua_setfield (L, LUA_REGISTRYINDEX, "points_string"); /* to
prevent garbage collection */
}

static void some_lua_func (lua_State *L)
{
    some_ud_t *ud;
    lua_getallocf (L,&ud);
    const char *st = lua_tostring (L, 1);
    if (ud->points_string == st) { /* That's the magic */
        /* We've got a match here */
    }
    return 0;
}

Once again, I didn't test all these mad things

But then I'd have to convert my switch to a series of ifs since the pointers to the literals are computed during runtime.

I know switches are not great with sparse constant spaces such as when using hashes (i.e. only a few tenths of used values from 2^32 possible values) but I like the compiler doing the binary search for me (gcc does) instead of having to figure it out myself. I'm also experimenting hashing the hash (just shifts and xors, nothing fancy or costly) to make the space dense enough so the switch will (hopefully) become a goto table + one if for each case to see if it's really a match. Better than the hard-coded binary tree...

Cheers,

Andre