lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


2008/4/28 Chip Salzenberg <chip@pobox.com>:
> I'm interested in using Lua in an environment in which large C strings will
>  already have been allocated.  I do not want to copy these large strings into
>  Lua's string pool - this would be wasteful.  (We're talking multi-hundred-K
>  strings that will be used briefly and then forgotten.)
>
>  I know that I can use userdata to hold pointers into external strings, and
>  I'm prepared to do that if necessary.  But I'd prefer a way to make Lua see
>  normal strings, while also 'cheating' on the memory management.  (The
>  userdata approach involves duplicating lots of the string library, or at
>  least lots of the string library interface.)
>
>  I think what I'm asking for is a way to create a Lua string object that has
>  all the normal string behaviors *except* with regard its target data, which
>  I want to escape Lua's normal memory management -- such that Lua will not
>  try to free that data when the object is GC'd.
>
>  Any clues?  Perhaps a feature I hadn't noticed?  adTHANKSvance

You must keep in mind that in Lua strings are immutable, so functions
of the string library returning altered copies of the input string
(eg. sub, gsub, captures of match, gmatch) will allocate new strings.
If you're going to keep your long strings intact, you can just put
them in a userdata and pass them around to your own libraries. If
you're going to manipulate them a lot with the Lua string library, an
initial duplication won't be that much costly compared to the
manipulations themselves.

If you're going to have mutable strings, you need to have them held by
reference, just like userdata, and in that case the Lua string library
API is not a good example since it's input parameters are input only
(modified strings are returned values). So if you want mutable strings
you're better just using userdata and creating a string manipulation
library for it.

So to summarize, here how I'd choose:

- immutable strings, heavy manipulations -> convert to string
- immutable, no manipulation -> userdata
- mutable (manipulation or not) -> userdata

In all cases modifying the core to introduce a special type of string
seems to me to be overkill.

One last note (because I thought of that afterward): if that's the
cost of hashing the strings that is bothering you (rather than the
memory usage of copying your long strings), the hash is not computed
on the whole string but just on a few tens of characters at its
beginning (I don't remember exactly how many), so the cost of hashing
a 100kB string is the same as for a 100B string.