lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Sat, Jan 7, 2012 at 1:00 AM, HyperHacker <hyperhacker@gmail.com> wrote:
> On Fri, Jan 6, 2012 at 12:17, Roberto Ierusalimschy
> <roberto@inf.puc-rio.br> wrote:
>>> If I am not mistaken current Lua implementation computes hash value by
>>> iterating through a string one character at a time. Would it be faster
>>> to iterate one word at a time (4 bytes at a time in 32bit
>>> architecture)?  It will require changing hash function implementation
>>> but it could speed up the calculations. Hopefully, we can still treat
>>> 127-byte long strings as short strings.
>>
>> I do not see a simple way to write this in a portable way (respecting
>> alignment requirements for word access). For strings already internal to
>> Lua we know about their alignment, but external strings (the ones to be
>> internalized) may start anywhere. We would need to do some arithmetic
>> with addresses, something invalid in ANSI C.
>
> That seems terribly amusing, given that C is supposed to be low-level.
> There's no way to convert the address to an integer and slow-copy the
> first few bytes until we're at an aligned address?

No. Pointers are weird. Doing this depends on implementation-specific
details. As it happens, a complete model for memory given as union {
int n; unsigned char b[sizeof(int)]; } is true almost everywhere.
However, (n & 0xff) may still be b[0] or b[3] or b[7], and I don't
believe there is any guarantee (intptr_t)&n takes on any value of
(intptr_t)&b[i].

Anyway, assuming that union, how do you implement a good hash based on
that int n? This is an honest question.

> --
> Sent from my toaster.

Appropriate enough. A design goal of ANSI C was to be reasonably
implementable on toasters....even if they use sign-magnitude 18-bit
ints. Or 286 protected mode segment descriptors....

Jay