[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: Unpooled strings?
- From: "Alex Davies" <alex.mania@...>
- Date: Tue, 29 Apr 2008 15:42:59 +0800
----- Original Message -----
From: "Jerome Vuarand"
Sent: Tuesday, April 29, 2008 11:41 AM
Subject: Re: Unpooled strings?
One last note (because I thought of that afterward): if that's the
cost of hashing the strings that is bothering you (rather than the
memory usage of copying your long strings), the hash is not computed
on the whole string but just on a few tens of characters at its
beginning (I don't remember exactly how many), so the cost of hashing
a 100kB string is the same as for a 100B string.
You should also check Mike Palls faster string hashing algorithm. It
generates a better hash from 3 ints in the string, but unfortunately isn't
ANSI C (due to requiring unaligned loads).
Jerome is right though. Any function in the string library except for
string.len is likely to [greatly] exceed the time taken to intern the
string, making the optimization moot.
Also there's quite a bit of complexitity there and features lost. Eg without
modifying the core, you could never get equality to work correctly between
the strings (which, due to not being interned, would always require a full
memory compare anyway), or table hashing for that matter. Mind you, this may
not be an issue for very-likely-to-be-unique multi hundred k strings though.
One thing that puzzles me though, is if they're constant... why not just
push them at the start of the program? If it's an embedded processor I can
see the problem there, but else wise memcpy isn't -that- slow. If they're
not constant, I'm intrigued as to how you can guarantee their safety in the
case that they aren't "forgotten". The only real use I can see for
non-interned strings is to make them fully mutable, which would bring with
it large performance increases for some applications.
If you're still keen though, check
http://lua-users.org/wiki/SpeedingUpStrings by Rici Lake. It provides
something similar to what you want, but modifies internals instead of using
userdata (ie, should be faster). Do note though, that the project was left
as the results were not as high as expected.