Mark Hamburg wrote:
The proposed syntax seems a bit odd to me as well. Particularly using
an
existing infix operator as a prefix operator.
Dito. There are many more programmers with a C heritage than an Icon
heritage. Prefix *t smells like 'dereference' or 'deep-copy' to me.
And certainly not like 'length' at all.
But I think we really have more than one issue here:
- How to express an append (auto-post-increment assignment) easily in
the
language:
I think t[*t] = a is too restrictive. It is quite useful to have an
append operator for arbitrary objects. But for many of them an
assignment
to the index that is equivalent to the length would be a complication
(e.g. trees, ordered lists or even streams). I would like to avoid
hardcoding such a requirement as a language idiom.
In fact we don't need a new operator for an append operation. We may
use
a nil table index assignment plus syntactic sugar for omitting nil:
t[] = o -- equivalent to t[nil] = o which is currently invalid
This is a nice orthogonal extension and it already has a well defined
metatable mapping: __newindex(t, nil, o). And it's faster, too.
- How to get the length:
There are two possibilities:
a. Introduce a new operator:
I do not like *t. While #t has an equivalent in perl, it would
waste
one of the few characters left. A prefix colon (:t) looks
strange, too.
But maybe we don't need a new operator for this one -- what about:
b. Reuse an existing operator:
Having a t[] lvalue it's obviously tempting to define semantics
for
a t[] rvalue, too. And in fact it's quite orthogonal to define
__index(t, nil) to return the length (and not nil):
n = t[]
- The reuse of __index and __newindex has nice implications for
orthogonality when it comes to generic container types built upon
tables or userdata:
Using a dynamic mutable string buffer library could look like:
s = buffer.new("foo") -- initial value
s[] = "bag" -- append string
s[6] = "r" -- modify a character
for k,v in pairs(t) do s[] = v end -- append strings in a loop
for c in s:chars() do f(c) end -- character iterator
for i=1,s[] do g(s[i]) end -- manual character iteration
print(s[]) -- get the length
print(s) -- convert buffer to string
BTW: Yes, I want s[] and s[i] for the builtin immutable strings, too.
What about a single, global metatable that applies to all
strings?
- Combining this with the new varargs syntax makes sense:
nargs = ...[]
arg1 = ...[1]
arg2 = ...[2]
Certainly more elegant than nargs = select("#", ...).
- How to set the length:
I'm not sure this is needed. I think that plain tables and all other
container objects (tables with metamethods or userdata) should manage
this on their own. It is best defined to be a read-only property.
If your particular container allows to set this property you may just
add a specific setter method.
The two uses Mark mentioned for table.setn() are better served by
having a function to efficiently clear (parts of) a table and
avoiding
the hole problem (see below).
- How to efficiently manage a length that is associated with the array
part of a table:
Storing it in the beginning of the array part (as Mark proposed)
solves
this nicely. I think it should have lazy semantics and work like a
cache:
init: t->array[0] = -1; /* when the array part is created */
search: {search for the last non-nil in array part}; return n+1;
getn: if (t->array[0] < 0) t->array[0] = search(L, t);
return t->array[0];
store with nil index (append):
int n = t->array[0];
if (n < 0) n = search(L, t);
t->array[++n] = o;
t->array[0] = n;
store with numeric index:
t->array[idx] = o;
t->array[0] = -1; /* invalidate the cache */
- The 'hole' problem:
I have a *strong* dislike to the 'a hole (nil) means stop' behaviour
of getn(), ipairs() and unpack(). This is just an annoying
side-effect
of the implementation and has no use of its own. I really would like
to
avoid burying this any deeper into the language than it already is.
IMHO:
- I don't want arrays where the length indicates the first nil
element
or the end.
- I want arrays with nils in them.
- I want arrays where the length indicates the last non-nil element.
- I want ipairs() to continue iterating over nils.
All of this is easy to do when we have a well-defined way to get the
length associated with an array (see above).
Here is my generic solution to the 'clear part of a table' problem:
- I propose table.move(t, [dst, [start, [stop]]]) with some nice
default
semantics: dst = 0, start = 1, stop = -1
Here 0 refers to an imaginary empty slice and -1 is the array length.
table.move(t) -- clear the whole array part
table.move(t, 0, start) -- delete slice [pos,*)
table.move(t, 0, start, stop) -- delete slice [pos,stop)
table.move(t, start, 0, len) -- insert empty slice at
[pos,pos+len)
table.move(t, 1, pos) -- move slice [pos,*) down
table.move(t, start, 1) -- move whole array up to pos
table.move(t, dst, start, stop) -- move slice [start,stop) to dst
Overlapping moves are invalid, of course. The cached length is
cleared
or readjusted for this operation (whatever is easier). An extension
to
lapi.c would be preferrable for speed (offering to write a patch if
anyone thinks this is a good idea).
Bye,
Mike