lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Roberto Ierusalimschy wrote:
> > > The main reasons we are changing Lua
> > > are that we like this idea of long non-internalized strings
> > 
> > That, I'm not so sure about. I'll freely admit: it does have a
> > certain appeal. But I'd want to see convincing, quantitative proof
> > that the resulting increase in complexity does not adversely
> > affect the performance of the common case. The simplicity of the
> > current solution and the resulting stability is a very convincing
> > argument against such a big change, too.
> 
> Certainly it is. The real interesting thing is that non-internalized
> strings open the door for "external" buffers, that is, buffers decopled
> form the TString struture. That would allow things like luaL_Buffer
> and even lua_concat to use the intermediate buffer as the resulting
> string. It would allow large literal strings to point directly to its
> fixed version. But simplicity is always a compelling argument.

Well, string data co-location is actually one of the strengths of
the current approach. Java strings originally have been designed
around segregated, shared string data. The following paper shows
that moving to unshared, co-located string data (i.e. what Lua
does) is faster for the JVM, too:

http://ssw.jku.at/Research/Papers/Haeubl08Master/

[It also makes a compelling case _against_ ropes. That's on of my
old pet peeves: IMHO ropes look nice on paper, but they increase
complexity and rarely pay off outside of targeted benchmarks.]

One would have to evaluate whether the trade-offs are better, if
only longer strings have segregated string data. Not sure. We need
to measure, not guess.

My ancient 'fast string' patch (look in the archives) did sort of
the opposite: store small string values entirely in the tagged
value slot. This reduced overhead for string interning, but that
turned out not to be the real bottleneck (at least for small
strings). The increase in complexity for handling two different
string types everywhere was noticeable, though.

--Mike