Re: proposal for reading individual characters from strings faster

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: proposal for reading individual characters from strings faster
From: Coroutines <coroutines@...>
Date: Mon, 5 May 2014 13:13:32 -0700

On Mon, May 5, 2014 at 11:47 AM, Tom N Harris <telliamed@whoopdedo.org> wrote:

> Never make assumptions without benchmarks. (I fell into that trap during the
> recent discussion about metatables on lightuserdata.) A quick test of running
> 1e9 single-byte string comparisons versus the number comparisons using Lua
> 5.2, they were roughly the same. In fact, the string test was very slightly
> faster.

I could be misunderstanding you here.  My proposal is not about
comparing whole strings byte-by-byte or with lua_Number-width.  I'm
trying to compare without creating single-byte substrings, which would
seemingly be costlier than just comparing char-to-char in C -- for
situations where you must switch() with character granularity ->
parsers.

> Your initial premise is flawed. Comparing character codes as numbers is not
> more advantageous than comparing the strings themselves. Thus there is no
> advantage to your proposal.

Below is my benchmark.  These both seem to run for equal time, so I am
inclined to believe that you are right -- the running time is the same
for either way of comparing individual characters.  My focus has been
on avoiding small string creation and eliminating the call overhead of
string.byte(), but I think too much would have to change to really
make this cheaper.  If strings did not have a metatable then [] could
be sugar (not __index) for looking up the byte at that index (a sort
of fast-tracked C call).

So it costs about the same to create a single-byte substring to
compare hashes with as it does to create an integer from a byte within
the string and compare that ~ for some reason.  I guess I'll put this
obsession to rest, I am defeated :-)

PS: Now I must benchmark to see if memory use is worse by comparing
single-byte substrings.

----

local clock = os.clock
local sbyte = string.byte
local ssub = string.sub

local long_string = ('%'):rep(2 ^ 14) -- 16K string (8K-start
allocated on stack, 8K in heap)

local iterations = 20000

local printf =
    function (fmt, ...)
        return print(fmt:format(...))
    end

local time = clock()

for x = 1, iterations do
    for y = 1, #long_string do
        if ssub(long_string, y, y) == '%' then
        end
    end
end

time = clock() - time

printf('for %d iterations: "ssub()" local call within loop compared
against single-byte string -> %d seconds', iterations, time)

time = clock() -- re-init

local PERCENT = sbyte('%')

for x = 1, iterations do
    for y = 1, #long_string do
        if sbyte(long_string, y, y) == PERCENT then
        end
    end
end

time = clock() - time

printf('test #2: for %d iterations: "sbyte()" local call within loop
compared against integer local PERCENT integer constant -> %d
seconds', iterations, time)

Follow-Ups:
- Re: proposal for reading individual characters from strings faster, Thomas Jericke

References:
- proposal for reading individual characters from strings faster, Coroutines
- Re: proposal for reading individual characters from strings faster, Philipp Janda
- Re: proposal for reading individual characters from strings faster, Coroutines
- Re: proposal for reading individual characters from strings faster, Tom N Harris

Prev by Date: Yielding from hook and lua_getinfo (Was: Personal Lua Versions)
Next by Date: Re: A Sugar Free Diet?
Previous by thread: Re: proposal for reading individual characters from strings faster
Next by thread: Re: proposal for reading individual characters from strings faster
Index(es):
- Date
- Thread