[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: [ANN] Lua 5.3.0 (work2) now available (number coercion holy crusade)
- From: Coroutines <coroutines@...>
- Date: Thu, 27 Mar 2014 12:56:18 -0700
On Thu, Mar 27, 2014 at 12:41 PM, Sean Conner <sean@conman.org> wrote:
> It was thus said that the Great Coroutines once stated:
>> On Thu, Mar 27, 2014 at 1:30 AM, Oliver Kroth <oliver.kroth@nec-i.de> wrote:
>> > It would use one single buffer, possibly with precomputed sufficient size
>> > (if, e.g. all operands are strings).
>> > The .. operators use a buffer each plus the memory for the (intermediate)
>> > result strings.
>> > So string.concat() could save memory allocation operations and the multiple
>> > copy operations from one buffer into the next.
>> >
>> > At least this is my understanding, correct me if I am wrong...
>> >
>> > -- Oliver
>> >
>> > Am 27.03.2014 00:26, schrieb Luiz Henrique de Figueiredo:
>> >>>
>> >>> Why not create such a function as a member of the string library?
>> >>> string.concat( "there ", "I", " would look ",1, "st")
>> >>
>> >> How is that different from just this?
>> >> "there " .. "I" .. " would look ",1 .. "st"
>> >>
>> >
>> >
>>
>> I don't know within what sort of structure Lua strings eventually wind
>> up (where is the hash stored?) but this is the luaL_Buffer structure:
>> http://www.lua.org/source/5.2/lauxlib.h.html#luaL_Buffer
>>
>> I kind of wish another member were added for referencing a string
>> after this one. So when you concatenate strings (especially long
>> strings) they instead form a linked list that can be walked invisibly
>> later, rather than actually moving the strings together in memory.
>> The hash would be recomputed from the starting segment to include all
>> segments it can iterate to (the string in total). If intermediate
>> string segments become unreferenced (collectable) they then get
>> realloc'd together with the nearest segment before that is still
>> reachable/referenced during a GC collection (for little gain though --
>> all you'd be doing away with is the "header" of the string structure.
>>
>> If the 'next' ptr is NULL the string ends at that segment, if not you
>> continue forward. A linked list of Lua strings has the potential to
>> contain many smaller, intermediate strings and avoid moving strings in
>> memory.
>
> I think the overhead of tracking these partial strings will swamp any
> savings in ... what?
>
> Take, for instance, the function here [1]
>
> static int strcore_wrap(lua_State *const L)
> {
> const char *src;
> size_t ssz;
> size_t margin;
> size_t breakp;
> const char *lead;
> size_t lsize;
> luaL_Buffer buf;
>
> src = luaL_checklstring(L,1,&ssz);
> margin = luaL_optinteger(L,2,DEF_MARGIN);
> lead = lua_tolstring(L,3,&lsize);
> breakp = margin;
>
> luaL_buffinit(L,&buf);
>
> while(true)
> {
> if (ssz < breakp)
> break;
>
> if (find_break_point(&breakp,src))
> {
> if (lead) luaL_addlstring(&buf,lead,lsize);
> luaL_addlstring(&buf,src,breakp - 1);
> luaL_addchar(&buf,'\n');
> src += breakp;
> ssz -= breakp;
> breakp = margin;
> }
> else
> break;
> }
>
> if (lead) luaL_addlstring(&buf,lead,lsize);
> luaL_addlstring(&buf,src,ssz);
> luaL_addchar(&buf,'\n');
> luaL_pushresult(&buf);
> return 1;
> }
>
> What this does is take a potentially long string, and inserts line breaks at
> around the given margin point (it finds the closest bit of whitespace at the
> margin to break at) and adds an optional string to begin each new line. So,
> for instance,
>
> x = "The quick brown fox jumps over the lazy dog"
> y = wrap(x,1,"> ")
> print(y)
>
> The
> > quick
> > brown
> > fox
> > jumps
> > over
> > the
> > lazy
> > dog
>
> (I had a very particular use case for this)
>
> So, using your method, the resulting ... um ... thing, would look I guess
> something like:
>
> x = "The quick brown fox jumps over the lazy dog"
> pre = "> "
> nl = "\n"
>
> y =
> [ &x[0]:3 ] -- pointer to x[0], length of 3
> [ &nl:1 ]
> [ &pre:2 ]
> [ &x[4]:5 ] -- quick
> [ &nl:1 ]
> [ &pre:2 ]
> [ &x[10]:5 ] -- brown
> [ &nl:1 ]
> [ &pre:2 ]
> [ &x[16]:3 ] -- fox
> [ &nl:1 ]
> [ &pre:2 ]
> [ &x[20]:5 ] -- jumps
> [ &nl:1 ]
> [ &pre:2 ]
> [ &x[26]:3 ] -- the
> [ &nl:1 ]
> [ &pre:2 ]
> [ &x[31]:4 ] -- lazy
> [ &nl:1 ]
> [ &pre:2 ]
> [ &x[36]:3 ] -- dog
> [ &nl:1 ]
>
> Even assuming a minimal overhead of a next pointer, string pointer and
> length, on a 32 bit system that's 276 bytes of overhead (ignoring any
> "string structure" overhead inherent in Lua) vs. 53 bytes (if I added
> correctly) for the string copy (ignoring any "string structure" overhead
> inherent in Lua). And the discrepency only gets worse on a 64-bit system
> (552 bytes vs 53 bytes).
>
> You will have to have a lot of very long strings for your method to win
> over the current method.
>
> -spc
>
> [1] from https://github.com/spc476/lua-conmanorg/blob/master/src/strcore.c
>
I was making a point that sometimes large realloc()'s to truly move
strings together in memory (concatenation) can be bad :> Seems like
for the usual small string the complexity and potential for cache
misses are worse, though.
- References:
- Re: [ANN] Lua 5.3.0 (work2) now available (number coercion holy crusade), Roberto Ierusalimschy
- Re: [ANN] Lua 5.3.0 (work2) now available (number coercion holy crusade), Hisham
- Re: [ANN] Lua 5.3.0 (work2) now available (number coercion holy crusade), Rena
- Re: [ANN] Lua 5.3.0 (work2) now available (number coercion holy crusade), William Ahern
- Re: [ANN] Lua 5.3.0 (work2) now available (number coercion holy crusade), Andrew Starks
- Re: [ANN] Lua 5.3.0 (work2) now available (number coercion holy crusade), Dirk Laurie
- Re: [ANN] Lua 5.3.0 (work2) now available (number coercion holy crusade), Oliver Kroth
- Re: [ANN] Lua 5.3.0 (work2) now available (number coercion holy crusade), Luiz Henrique de Figueiredo
- Re: [ANN] Lua 5.3.0 (work2) now available (number coercion holy crusade), Oliver Kroth
- Re: [ANN] Lua 5.3.0 (work2) now available (number coercion holy crusade), Coroutines
- Re: [ANN] Lua 5.3.0 (work2) now available (number coercion holy crusade), Sean Conner