[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Lua string handling
- From: Jonathan Castello <twisolar@...>
- Date: Wed, 15 Sep 2010 02:09:04 -0700
On Wed, Sep 15, 2010 at 1:55 AM, Joshua Phillips
<jp.sittingduck@gmail.com> wrote:
> On 15/09/10 00:19, David Given wrote:
>>
>> I'm working on a new (yet another...) programming language, and I'm
>> about to start working on the string system. I'm looking with interest
>> at Lua's string handling, as it works very well.
>>
>> I'm already sold on having immutable strings, as there are lots of
>> advantages with regard to sharing of string data etc, but what are the
>> advantages to having atomised strings (i.e., each string has one and
>> precisely one copy in memory)? Is it to allow strings to be compared for
>> equality by just comparing their pointers? My language isn't based
>> around key/value pairs the way Lua is, so that may not be as important
>> to me; are there any other benefits?
>>
>> Also, what's the performance characteristics of using atomised strings?
>> In particular, I'm wondering what the amortised performance of adding a
>> new string to the system is like; when using atomised strings you need
>> to do a lookup of the string table to see if the string is there, plus
>> an optional memory allocation if it turns out that it's not. A
>> non-atomised implementation doesn't do the lookup but has to always do
>> the memory allocation.
>>
>> What hash function does Lua use for strings
>
> Having only one copy of each string is called string interning. In Lua I
> assume (I haven't looked!) that, as well as comparing strings, it means
> looking a string key up in a table requires only a modulus of the atom,
> rather than hashing the whole string content, to find the mainposition.
>
>
I can't find my source (and that really irks me), but I remember
reading on this list that a certain maximum number of characters of a
string was used to generate the hash, and if the string was longer, it
would space out the indices of the characters being used for the hash.
Or something. Please don't quote me.
~Jonathan