[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: proposal for reading individual characters from strings faster
- From: Coroutines <coroutines@...>
- Date: Mon, 5 May 2014 13:13:32 -0700
On Mon, May 5, 2014 at 11:47 AM, Tom N Harris <telliamed@whoopdedo.org> wrote:
> Never make assumptions without benchmarks. (I fell into that trap during the
> recent discussion about metatables on lightuserdata.) A quick test of running
> 1e9 single-byte string comparisons versus the number comparisons using Lua
> 5.2, they were roughly the same. In fact, the string test was very slightly
> faster.
I could be misunderstanding you here. My proposal is not about
comparing whole strings byte-by-byte or with lua_Number-width. I'm
trying to compare without creating single-byte substrings, which would
seemingly be costlier than just comparing char-to-char in C -- for
situations where you must switch() with character granularity ->
parsers.
> Your initial premise is flawed. Comparing character codes as numbers is not
> more advantageous than comparing the strings themselves. Thus there is no
> advantage to your proposal.
Below is my benchmark. These both seem to run for equal time, so I am
inclined to believe that you are right -- the running time is the same
for either way of comparing individual characters. My focus has been
on avoiding small string creation and eliminating the call overhead of
string.byte(), but I think too much would have to change to really
make this cheaper. If strings did not have a metatable then [] could
be sugar (not __index) for looking up the byte at that index (a sort
of fast-tracked C call).
So it costs about the same to create a single-byte substring to
compare hashes with as it does to create an integer from a byte within
the string and compare that ~ for some reason. I guess I'll put this
obsession to rest, I am defeated :-)
PS: Now I must benchmark to see if memory use is worse by comparing
single-byte substrings.
----
local clock = os.clock
local sbyte = string.byte
local ssub = string.sub
local long_string = ('%'):rep(2 ^ 14) -- 16K string (8K-start
allocated on stack, 8K in heap)
local iterations = 20000
local printf =
function (fmt, ...)
return print(fmt:format(...))
end
local time = clock()
for x = 1, iterations do
for y = 1, #long_string do
if ssub(long_string, y, y) == '%' then
end
end
end
time = clock() - time
printf('for %d iterations: "ssub()" local call within loop compared
against single-byte string -> %d seconds', iterations, time)
time = clock() -- re-init
local PERCENT = sbyte('%')
for x = 1, iterations do
for y = 1, #long_string do
if sbyte(long_string, y, y) == PERCENT then
end
end
end
time = clock() - time
printf('test #2: for %d iterations: "sbyte()" local call within loop
compared against integer local PERCENT integer constant -> %d
seconds', iterations, time)