lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


I am re-reading the LTN9 describing a solution to a problem with Lua:
it has, if I understand correctly, immutable strings, ie. each and every
operation on a string leaves the old state in limbo, waiting to be garbage
collected, creating a new string with the new state.
I fear that even a gsub(s, 'a', 'b'), ie. a simple substitution of
characters without size change will generate a new string.

I suppose there is no other way to do this, avoiding allocating memory and
copying strings, otherwise the Lua team (or the experts that are playing with
alternate Lua solutions) would have implemented it.

Now, judging from the proposed solution, inserting a string in a table has
no cost, ie. I suppose it is a reference to the string that is put in the
table, instead of a copy of the string. Otherwise, the solution would be worse
than the problem...

I propose to take one step further to the solution, by implementing this
algorithm in C. Several approaches are possible, I will suggest some.

Eg., we can just manually tinsert the strings to a given table, and call a
function to collate them in a resulting string.
t = {}
tinsert(t, s1)
tinsert(t, s2)
finalstring = collate(t)
The drawback of this solution is that you can call the collate function with
any table, including one with tables or functions in it, or hashes, which is
inappropriate. The behavior in this case is to be defined: either just
ignore the incorrect entries, or make the function fails and returns nil.

We can also create a opaque table (userdata?), that can't be manipulated by
tinsert/tremove and alike, call a function to add the strings to this special
table, and call another one to do the final concatenation and destroy the
table. Eg.:
st = makestringtable()
addtostringtable(st, s1)
addtostringtable(st, s2)
finalstring = collate(st)

Of course, the names are to be improved! And this solution, if quite safe,
is quite clumsy...
Of course, in all cases, the programmer must remember to use these functions
instead of the more intuitive .. operator. But this is mostly useful in
loops (so my examples are stupid ;-), so knowing when to use it can be quite
natural.

As I understand it, using a C function to do the final collation has the
advantage it can allocate exactly the needed memory and concatenate the strings
internally, without creating data to be garbage collected, except the table
of strings, which will be destroyed (or not).

Other designs can be proposed, but if this idea is not so stupid (saves
memory, increases speed), I propose to integrate such a solution to the string
library of the 4.1 release.

What do you think?

-- 
--._.·´¯`·._.·´¯`·._.·´¯`·._.·´¯`·._.·´¯`·._.·´¯`·._.--
Philippe Lhoste (Paris -- France)
Professional programmer and amateur artist
http://jove.prohosting.com/~philho/
--´¯`·._.·´¯`·._.·´¯`·._.·´¯`·._.·´¯`·._.·´¯`·._.·´¯`--

Sent through GMX FreeMail - http://www.gmx.net