lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


> icon's string allocation strategy, described in

>    http://www.cs.arizona.edu/icon/docs/ipd277.htm

> seems to argue against string interning.  not sure how relevant that
strategy
> is to lua though (different domains, different tradeoffs ???)

Different domains, indeed. It is true that you cannot intern strings
without doing at least one string comparison at the end. This is
particularly slow if the string is long *and* present. It is quite fast if
the string is not interned, because you are likely to find a difference in
the first few characters. It is reasonably fast if the string is already
interned and shortish.

So if you generate an enormous string, it will intern rapidly because it is
probably unique. If you generate a small string, it will intern rapidly
because it is short. But the real win is if you don't generate a string at
all; if you just copy an existing one. In this case, it is already interned
and there is no cost whatsoever. Furthermore, if the string comes from a
Lua script (as opposed to a C program), it is interned once when the script
is loaded and never needs to be interned again, which is a huge saving.

So for all the respect I have for Ralph Griswold, I don't think the
analysis fits Lua, and I'm not sure if it would even fit Icon if Icon were
reimplemented today: the world has changed a lot since 1991, and 64K of
string space is no longer a huge cost.

Also, Icon is a language, albeit extendable. Icon-style strings, which are
not null-terminated (to support overlaying substrings), and which move
around when garbage collected, are not really suitable for integrating with
C code, which expects strings to end with nulls and to have fixed
addresses. The Icon-C interface involves copying the string; Lua, on the
other hand, null-terminates internal strings even though this is not
necessary for its implementation, and does not use a compacting garbage
collector, so it can just hand a C program an internal string pointer (with
the implicit contract that the C program does not modify the string). For
importing a string from the C environment into the internal environment,
both languages do the copy but Lua only does the copy if the string is not
present. So I think that Lua probably has a much faster C interface.

The most relevant comparison I could find on the "Great Computer Language
Shootout" page is the spell-check example: here are Lua/Icon results:

Measurement of CPU as N varies
                N
        1    4    7   10
lua  0.71 1.42 2.12 2.82
icon 1.15 2.89 4.65 6.37

(Icon 9.3.2, Lua 4.0)

The test consists of reading a dictionary and then matching an input file
against it. Both the dictionary and the input file consist of lines with a
single word on each line. The input file is generated by making N copies of
a file (it is, in fact, the same file as the dictionary with one addition);
consequently the CPU time could be thought of as time(read-dict) +
N*time(read_file).

Based on that we have:

read-file = (t(N10)-t(N7))/3
read-dict = t(N1) - read-dict

     read-dict  read-file
lua:    0.57       0.23
icon:   0.57       0.57

I think this clearly shows the benefit of string interning: Lua and Icon
take exactly the same amount of time to read the unique words in the
dictionary, but Lua does not have to create the words from the file, and so
reading the input file takes 60% less time, whereas Icon takes exactly the
same amount of time.

Lua outperforms Icon on most tests but Icon shines on string concatenation,
outperforming Lua ten to one. However, it it important to observe that the
test is to create a single string which consists of n copies of an original
string (and using builtin primitives to do this is not allowed.) Icon
optimises the case of concatenating to the string most recently created; I
suspect that had the test required interleaved concatenations, Icon would
have faired much worse. That's the trouble with benchmarks.

R.