lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Thu, Mar 27, 2014 at 12:41 PM, Sean Conner <sean@conman.org> wrote:
> It was thus said that the Great Coroutines once stated:
>> On Thu, Mar 27, 2014 at 1:30 AM, Oliver Kroth <oliver.kroth@nec-i.de> wrote:
>> > It would use one single buffer, possibly with precomputed sufficient size
>> > (if, e.g. all operands are strings).
>> > The .. operators use a buffer each plus the memory for the (intermediate)
>> > result strings.
>> > So string.concat() could save memory allocation operations and the multiple
>> > copy operations from one buffer into the next.
>> >
>> > At least this is my understanding, correct me if I am wrong...
>> >
>> > -- Oliver
>> >
>> > Am 27.03.2014 00:26, schrieb Luiz Henrique de Figueiredo:
>> >>>
>> >>> Why not create such a function as a member of the string library?
>> >>> string.concat( "there ", "I", " would look ",1, "st")
>> >>
>> >> How is that different from just this?
>> >>         "there " .. "I" .. " would look ",1 .. "st"
>> >>
>> >
>> >
>>
>> I don't know within what sort of structure Lua strings eventually wind
>> up (where is the hash stored?) but this is the luaL_Buffer structure:
>> http://www.lua.org/source/5.2/lauxlib.h.html#luaL_Buffer
>>
>> I kind of wish another member were added for referencing a string
>> after this one.  So when you concatenate strings (especially long
>> strings) they instead form a linked list that can be walked invisibly
>> later, rather than actually moving the strings together in memory.
>> The hash would be recomputed from the starting segment to include all
>> segments it can iterate to (the string in total).  If intermediate
>> string segments become unreferenced (collectable) they then get
>> realloc'd together with the nearest segment before that is still
>> reachable/referenced during a GC collection (for little gain though --
>> all you'd be doing away with is the "header" of the string structure.
>>
>> If the 'next' ptr is NULL the string ends at that segment, if not you
>> continue forward.  A linked list of Lua strings has the potential to
>> contain many smaller, intermediate strings and avoid moving strings in
>> memory.
>
>   I think the overhead of tracking these partial strings will swamp any
> savings in ... what?
>
>   Take, for instance, the function here [1]
>
> static int strcore_wrap(lua_State *const L)
> {
>   const char *src;
>   size_t      ssz;
>   size_t      margin;
>   size_t      breakp;
>   const char *lead;
>   size_t      lsize;
>   luaL_Buffer buf;
>
>   src    = luaL_checklstring(L,1,&ssz);
>   margin = luaL_optinteger(L,2,DEF_MARGIN);
>   lead   = lua_tolstring(L,3,&lsize);
>   breakp = margin;
>
>   luaL_buffinit(L,&buf);
>
>   while(true)
>   {
>     if (ssz < breakp)
>       break;
>
>     if (find_break_point(&breakp,src))
>     {
>       if (lead) luaL_addlstring(&buf,lead,lsize);
>       luaL_addlstring(&buf,src,breakp - 1);
>       luaL_addchar(&buf,'\n');
>       src    += breakp;
>       ssz    -= breakp;
>       breakp  = margin;
>     }
>     else
>       break;
>   }
>
>   if (lead) luaL_addlstring(&buf,lead,lsize);
>   luaL_addlstring(&buf,src,ssz);
>   luaL_addchar(&buf,'\n');
>   luaL_pushresult(&buf);
>   return 1;
> }
>
> What this does is take a potentially long string, and inserts line breaks at
> around the given margin point (it finds the closest bit of whitespace at the
> margin to break at) and adds an optional string to begin each new line.  So,
> for instance,
>
>         x = "The quick brown fox jumps over the lazy dog"
>         y = wrap(x,1,"> ")
>         print(y)
>
>         The
>         > quick
>         > brown
>         > fox
>         > jumps
>         > over
>         > the
>         > lazy
>         > dog
>
> (I had a very particular use case for this)
>
> So, using your method, the resulting ... um ... thing, would look I guess
> something like:
>
>         x = "The quick brown fox jumps over the lazy dog"
>         pre = "> "
>         nl  = "\n"
>
>         y =
>         [ &x[0]:3 ]     -- pointer to x[0], length of 3
>         [ &nl:1 ]
>         [ &pre:2 ]
>         [ &x[4]:5 ]     -- quick
>         [ &nl:1 ]
>         [ &pre:2 ]
>         [ &x[10]:5 ]    -- brown
>         [ &nl:1 ]
>         [ &pre:2 ]
>         [ &x[16]:3 ]    -- fox
>         [ &nl:1 ]
>         [ &pre:2 ]
>         [ &x[20]:5 ]    -- jumps
>         [ &nl:1 ]
>         [ &pre:2 ]
>         [ &x[26]:3 ]    -- the
>         [ &nl:1 ]
>         [ &pre:2 ]
>         [ &x[31]:4 ]    -- lazy
>         [ &nl:1 ]
>         [ &pre:2 ]
>         [ &x[36]:3 ]    -- dog
>         [ &nl:1 ]
>
> Even assuming a minimal overhead of a next pointer, string pointer and
> length, on a 32 bit system that's 276 bytes of overhead (ignoring any
> "string structure" overhead inherent in Lua) vs. 53 bytes (if I added
> correctly) for the string copy (ignoring any "string structure" overhead
> inherent in Lua).  And the discrepency only gets worse on a 64-bit system
> (552 bytes vs 53 bytes).
>
>   You will have to have a lot of very long strings for your method to win
> over the current method.
>
>   -spc
>
> [1] from https://github.com/spc476/lua-conmanorg/blob/master/src/strcore.c
>

I was making a point that sometimes large realloc()'s to truly move
strings together in memory (concatenation) can be bad :>  Seems like
for the usual small string the complexity and potential for cache
misses are worse, though.