Optimized Str Rep

lua-users home

[!] VersionNotice: The below code pertains to an older Lua version, Lua 4. It does not run as is under Lua 5.

Here follows a lua version of the strrep function faster than the original, written in C. In the average, it is about 3 times faster.

All the tests were performed on a Pentium II, running linux in single-user mode, with enough memory to prevent swapping and Lua version 4.0.

The algorithm was inspired on LTN 9, by RobertoIerusalimschy and was written by LuizCarlosSilveira.

The essence of the algorithm is to do the minimum number of concatenations possible.

Just for curiosity: compared to the original strrep, lua strrep has a questionably lower O() function, but this has to be studied better.

Plotting their "repetition number" x "time" curve, we can observe the behavior of both functions
(time is in seconds and repetition number is in bytes):

Here follows a program with the algorithm. It was used for both testing the correctness of the implementation and generating the data to plot the curve.

function log2(n)
    local _n = 2
    local x = 1
    if (_n < n) then
            x = x + 1
            _n = _n + _n
        until (_n >= n)
    elseif (_n > n) then
        if (n == 1) then
            return 0
            return nil
    if (_n > n) then
        return x-1
        return x
function get_bits(n)
    local bits = {}
    local rest = n
        local major_bit = log2(rest)
        rest = rest - 2^major_bit
        bits[major_bit] = 1
        if (bits.count == nil) then
            bits.count = major_bit
    until (rest == 0)
    return bits

function fast_strrep(str, times)
    local bits = get_bits(times)
    local strs = {[0] = str}

    local count = bits.count

    for i = 1, count do
        strs[i] = strs[i-1] .. strs[i-1]

    local result = ''
    for i = 0, count do
        if (bits[i]) then
            result = result .. strs[i]

    return result


for numreps = 1024, 30*1024*1024, 1024*64 do

    a = nil
    b = nil

    start = clock()
    a = fast_strrep("a", numreps)
    print("L:"..numreps.." "..(clock() - start))
    start = clock()
    b = strrep("a", numreps)
    print("C:"..numreps.." "..(clock() - start))

    if (a~=b) then
        print("the algorithm is wrong!")




It doesn't surprise me that your version is faster, actually; the lua library version of strrep (v 4.0 at least) has a very heavy overhead in function calls (one per character) which is not shared by the concatenation function. I find it curious that the library version doesn't simply work out how many characters it will need, malloc them, and then repetitively memcpy the source string, which I would think would be an order of magnitude faster and O(MN + M) (M copies of a string of length N). But I suppose that making concatenations of megabyte size wasn't really contemplated.

The algorithm you use is reminiscent of the minimal multiplication algorithm for computing exponentials. Here is a version which takes advantage of the fact that Lua optimises a .. b .. c into a single operation, and which also avoids creating a temporary vector. I think you'll find it to be about twice as fast as yours (and a lot shorter). It might also be interesting as an example of how to do stuff that looks like bit manipulation without too much pain. -- RiciLake

  -- Suppose that x = b[n]*2^n + b[n-1]*2^(n-1) + ... + b[0]*2^0
  --   (where every b[i] is either 0 or 1)
  -- This is exactly equivalent to:
  --    b[0] + 2 * (b[1] + 2 * (b[2] + (... + b[n])))
  -- So we've effectively eliminated all the multiplications, replacing them with doubling.

  -- Now, x * y (for any y) can be calculated by distributing multiplication over the
  -- above, which effectively replaces every b[i] with b[i] * y. However, every b[i]
  -- is either 0 or 1, so the product is either 0 or y.

  -- Now, if k is an integer and str1 and str2 are strings, and we write:
  --   str1 + str2       for the concatenation of str1 and str2
  --   k * str1          for "k copies of str1 concatenated"
  -- we can see that we have + and * are "just like" integer arithmetic in the sense that
  -- + and * are commutative and associative, and * distributes over +. So the equivalence
  -- continues to work, except that every term must be either "" (for 0) or y (the string).

  -- All that is left is to compute the expression from the inside out: each step is
  -- either 2 * r or y + 2 * r, where r is the cumulated value and y is the original string.
  -- In string terms, we can write these as result .. result (2 * r) and
  -- result .. result .. str (2 * r + y)

  -- We could use the same idea to compute integer exponents in the minimum number of
  -- multiplications, using * and ^ instead of + and * (which is where this algorithm
  -- comes from.)

  -- This makes use of the fact that Lua optimises a .. b .. c into a single concatenation.
  -- With a bit more work, we could use any base we wanted to, not just base 2. But it would
  -- require more options in the if statement.

function another_strrep(str, times)
  local result = ""
  local high_bit = 1
  while high_bit < times do high_bit = high_bit * 2 end

  -- at this point, high_bit is the largest (integral) power of 2 smaller than times
  -- (unless times < 1 in which case high_bit is 1)
  -- The computation of highbit could be:
  --   local high_bit = 2 ^ floor(log(times) / log(2))
  -- which is probably faster but requires the math library

  -- we are now going to work through times, bit by bit, making use of the above formula:

  while high_bit >= 1 do
    if high_bit <= times then           -- the bit is 1 if times is >= high_bit
      times = times - high_bit          -- we "turn it off" for the next iteration
      result = result .. result .. str  -- and the next step is 2 * r + y
    else                                -- the bit is 0
      result = result .. result         -- so the next step is 2 * r
    high_bit = high_bit / 2             -- Now go for the next bit
  return result

Your algorithm is really good. Thank you. The idea of them both (yours and mine) are almost the same, right? You did the excellent optimization that I couldn't realize how to do it by the time I wrote it, which is preventing the creation of auxiliary pieces. I plotted the curve for the 3 algorithms, that you can see below. For the data plotted in this curve, the exact average relations of the 3 functions are:
luiz/rici = 1.41  (rici is  1.41  times faster than luiz)
c/luiz    = 2.98  (luiz is  2.98  times faster than c)
c/rici    = 4.19  (rici is  4.19  times faster than c)

Since your algorithm is faster (and appears to use less memory), it is the one that belongs to this page. Unless you dislike the idea, I'll remove mine and give the place to your's. But, before, let me ask you something: could you please put some comments on your code in order to make the algorithm clearer? Due to your optimization, the idea behind it has been obscured... --LuizCarlosSilveira

OK, I compulsively commented it. I hope that it is clear; sometimes I think the code itself is clearer. The function is not as optimal as it could be because it does one more concatenation than is necessary... I was trying to make the code short rather, and counting on Lua to do "" .. "" .. str very rapidly.

Just for fun, and to demonstrate something (I'm not sure what), I throw in the base 10 version of the above algorithm. Rather than using a case statement and doing the bit-by-bit computation, I use gsub to do the loop and convert the repetition count to a string to work out the digits. Table lookup is (probably) a lot faster than a string of if statements, so I used that, too. %state is a standard trick for getting around the fact that Lua 4.0 doesn't have real closures.

I do not claim that this function is easy to read, but my tests indicate that it is even faster. (Sorry, no other comments, but the idea is the same, so you should be able to work it out. :-) ) Just shows what you can do if you're warped enough. I have another example of this sort of thing somewhere: a join function which I wrote which lazily compiles subroutines to do the same sort of exponential decomposition of the problem; even though it has to compose and compile functions, it turns out to be much faster than the naive join. Of course, the functions have to be memoised to take advantage of that. I'll try to post that one, too. -- RiciLake

  local concats = {
    ["0"] = function(a, b) return a .. a .. a .. a .. a .. a .. a .. a .. a .. a end,
    ["1"] = function(a, b) return a .. a .. a .. a .. a .. a .. a .. a .. a .. a
                                    .. b end,
    ["2"] = function(a, b) return a .. a .. a .. a .. a .. a .. a .. a .. a .. a
                                    .. b .. b end,
    ["3"] = function(a, b) return a .. a .. a .. a .. a .. a .. a .. a .. a .. a
                                    .. b .. b .. b end,
    ["4"] = function(a, b) return a .. a .. a .. a .. a .. a .. a .. a .. a .. a
                                    .. b .. b .. b .. b end,
    ["5"] = function(a, b) return a .. a .. a .. a .. a .. a .. a .. a .. a .. a
                                    .. b .. b .. b .. b .. b end,
    ["6"] = function(a, b) return a .. a .. a .. a .. a .. a .. a .. a .. a .. a
                                    .. b .. b .. b .. b .. b .. b end,
    ["7"] = function(a, b) return a .. a .. a .. a .. a .. a .. a .. a .. a .. a
                                    .. b .. b .. b .. b .. b .. b .. b end,
    ["8"] = function(a, b) return a .. a .. a .. a .. a .. a .. a .. a .. a .. a
                                    .. b .. b .. b .. b .. b .. b .. b .. b end,
    ["9"] = function(a, b) return a .. a .. a .. a .. a .. a .. a .. a .. a .. a
                                    .. b .. b .. b .. b .. b .. b .. b .. b .. b end,

  function decimal_strrep(str, times)
    local state = {r = "" }
    local concats = %concats
    times = tostring(times)
    if strfind(times, "^[0-9]+$") then
      gsub(times, "(.)",
             %state.r = %concats[digit](%state.r, %str)
    return state.r


By the way, the test is not all that great because the time will vary with the number of 1 bits in the binary expansion of the repetition count. (I think my version is slightly less sensitive to that, but it will still be a factor.) The test counts you use all have mostly 0's in their binary expansion. -- RiciLake

I belive this is why you told your algorithm should be twice as fast as mine. What really happens is that when I use the repetition number formed by only 1 bit turned on, it is the best case for memory consumption. In the worse case, my algorithm appears to use twice as memory as yours. I agree that the measurements I did should be reviewed, but this page is only starting... --LuizCarlosSilveira

Fair enough. It would also be interesting to try the test with a longer string than "a". I've never quite figured out how to adequately benchmark Lua programs, because garbage collection times vary with the total heap size. It would probably be best to run everything a few times to get the heap size to stabilise, and then do the timing. --RiciLake

RecentChanges · preferences
edit · history
Last edited September 21, 2017 8:29 pm GMT (diff)