lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On 1/7/2012 9:40 AM, Jay Carlson wrote:
[snip snip]
On Fri, Jan 6, 2012 at 10:41 AM, Roberto Ierusalimschy wrote:
Yes, it would be great to hear "official" opinion from both Lua team
and Mike.

Well, this is not an official opinion, but one possible approach would
be to break strings into two variants: short strings and long strings,
where long strings would not be interned any more.  More details later
(too late to write it now).

If I understood the problem correctly, a variable hash seed would
solve it, except for the skipping of characters. So, one approach
is to implement strings using two variants: short strings (up to
32 bytes) and long strings.

Consider /usr/bin/md5sum and sha1sum. Their hex output is 32 and 40
characters. I've indexed tables that way, and would have been
surprised if that length crossed a line. But I'm not quite sure what
the consequences would be for boxed strings.

FWIW, Intel and AMD processor cache lines are mostly 64 bytes these days, and (I think) is consistently the same width all the way down to DDR3 burst modes. It is possible that upping short strings to somewhere >32 bytes and <=64 bytes may not impact performance that much (best measured of course, and it will also depend on alignment of string memory allocations.)

--
Cheers,
Kein-Hong Man (esq.)
Kuala Lumpur, Malaysia