lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


tl;dr: boxed strings could be auto-interned/unboxed at the first
moment they are compared, including use as a key. This preserves value
semantics while allowing slurping in files 1M at a time.

On Mon, Jan 9, 2012 at 10:38 PM, Xavier Wang <weasley.wx@gmail.com> wrote:

> I really think that a programming language should have two kinds of
> string: One for Symbol and One for really string.

Smalltalk-80 never really recovered from this choice. In large systems
you end up

  a) obsessively documenting who controls the string, the same way we
handle freeing memory in C,
  b) obsessively cloning strings other people hand to you in order to avoid
  c) tearing your hair out debugging problems caused by disagreements about a).

Come to think of it, I've seen this happen in C too, except that const
helps a little, and string manipulation sucks so much that nobody does
it casually. And you're already documenting who owns the memory.

Perhaps some of these problems could be solved by transforming
writable-strings into strings at module boundaries. But then I start
to wonder: why exactly did we want to work so hard to get writable
strings in the first place?

> [Symbols would be] compared with
> pointer, and has a pre-calculate hash value. its just like identify in
> a language: we don't care its content, but only use it as a signature.

You're focusing on the implementation. The important thing about Lua
strings, what gives them their symbol-nature, is that they have value
semantics. Just as you can't distinguish one 73 from another 73, you
can't distinguish one "73" from another "73".

In Java you have the worst of both worlds: Strings are immutable, but
"73" may or may not be == to "73". Perhaps this is fixable by
auto-interning strings the moment they are compared. I wonder if that
could be done in the LambdaMOO implementation, actually.

> String is a real byte array,

I think Lua is missing real strings too, but real strings are Unicode.

> It doesn't need using
> as hash-table key (but it could),

No, it can't. Not by value at least.

s1 = buffer"plane"
s2 = buffer"plans"
t = {}; t[s1] = true; t[s2] = false

s1[5] = "s"
print(t[s2])

> I really like string implement in nowadays lua, simple, and beautiful.
> but a language should have a really mutable string.

Why? Strings are almost always the wrong answer to "how should I
represent a composite data structure". IMO string processing is
something that only happens at the edges of systems; most processing
should be done with real data structures, not the untyped morass of
strings. It's like declaring all functions like

  void *interpolate_query(void *sql_template, void *title_text);

Any type safety is dead. There is no duck typing for "Jay's book's price".
-- 
Jay

Strong authentication just proves which chump is in front of the keyboard.