[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: table insertion [was: Lua 5.1 (work2) now available]
- From: Mike Pall <mikelu-0411@...>
- Date: Thu, 4 Nov 2004 21:25:34 +0100
Hi,
Mark Hamburg wrote:
> The proposed syntax seems a bit odd to me as well. Particularly using an
> existing infix operator as a prefix operator.
Dito. There are many more programmers with a C heritage than an Icon
heritage. Prefix *t smells like 'dereference' or 'deep-copy' to me.
And certainly not like 'length' at all.
But I think we really have more than one issue here:
- How to express an append (auto-post-increment assignment) easily in the
  language:
  
  I think t[*t] = a is too restrictive. It is quite useful to have an
  append operator for arbitrary objects. But for many of them an assignment
  to the index that is equivalent to the length would be a complication
  (e.g. trees, ordered lists or even streams). I would like to avoid
  hardcoding such a requirement as a language idiom.
  In fact we don't need a new operator for an append operation. We may use
  a nil table index assignment plus syntactic sugar for omitting nil:
    t[] = o    -- equivalent to t[nil] = o which is currently invalid
  This is a nice orthogonal extension and it already has a well defined
  metatable mapping: __newindex(t, nil, o). And it's faster, too.
- How to get the length:
  There are two possibilities:
  a. Introduce a new operator:
     I do not like *t. While #t has an equivalent in perl, it would waste
     one of the few characters left. A prefix colon (:t) looks strange, too.
     But maybe we don't need a new operator for this one -- what about:
  b. Reuse an existing operator:
     Having a t[] lvalue it's obviously tempting to define semantics for
     a t[] rvalue, too. And in fact it's quite orthogonal to define
     __index(t, nil) to return the length (and not nil):
       n = t[]
- The reuse of __index and __newindex has nice implications for
  orthogonality when it comes to generic container types built upon
  tables or userdata:
  
  Using a dynamic mutable string buffer library could look like:
    s = buffer.new("foo")               -- initial value
    s[] = "bag"                         -- append string
    s[6] = "r"                          -- modify a character
    for k,v in pairs(t) do s[] = v end  -- append strings in a loop
    for c in s:chars() do f(c) end      -- character iterator
    for i=1,s[] do g(s[i]) end          -- manual character iteration
    print(s[])                          -- get the length
    print(s)                            -- convert buffer to string
  BTW: Yes, I want s[] and s[i] for the builtin immutable strings, too.
       What about a single, global metatable that applies to all strings?
- Combining this with the new varargs syntax makes sense:
  nargs = ...[]
  arg1  = ...[1]
  arg2  = ...[2]
  Certainly more elegant than nargs = select("#", ...).
- How to set the length:
  I'm not sure this is needed. I think that plain tables and all other
  container objects (tables with metamethods or userdata) should manage
  this on their own. It is best defined to be a read-only property.
  If your particular container allows to set this property you may just
  add a specific setter method.
  
  The two uses Mark mentioned for table.setn() are better served by
  having a function to efficiently clear (parts of) a table and avoiding
  the hole problem (see below).
- How to efficiently manage a length that is associated with the array
  part of a table:
  Storing it in the beginning of the array part (as Mark proposed) solves
  this nicely. I think it should have lazy semantics and work like a cache:
  init:    t->array[0] = -1;  /* when the array part is created */
  search:  {search for the last non-nil in array part}; return n+1;
  getn:    if (t->array[0] < 0) t->array[0] = search(L, t);
           return t->array[0];
  store with nil index (append):
           int n = t->array[0];
           if (n < 0) n = search(L, t);
           t->array[++n] = o;
           t->array[0] = n;
  store with numeric index:
           t->array[idx] = o;
           t->array[0] = -1;  /* invalidate the cache */
- The 'hole' problem:
  I have a *strong* dislike to the 'a hole (nil) means stop' behaviour
  of getn(), ipairs() and unpack(). This is just an annoying side-effect
  of the implementation and has no use of its own. I really would like to
  avoid burying this any deeper into the language than it already is.
  IMHO:
  - I don't want arrays where the length indicates the first nil element
    or the end.
  
  - I want arrays with nils in them.
  - I want arrays where the length indicates the last non-nil element.
  - I want ipairs() to continue iterating over nils.
  All of this is easy to do when we have a well-defined way to get the
  length associated with an array (see above).
Here is my generic solution to the 'clear part of a table' problem:
- I propose table.move(t, [dst, [start, [stop]]]) with some nice default
  semantics: dst = 0, start = 1, stop = -1
  Here 0 refers to an imaginary empty slice and -1 is the array length.
  
    table.move(t)                   -- clear the whole array part
    table.move(t, 0, start)         -- delete slice [pos,*)
    table.move(t, 0, start, stop)   -- delete slice [pos,stop)
    table.move(t, start, 0, len)    -- insert empty slice at [pos,pos+len)
    table.move(t, 1, pos)           -- move slice [pos,*) down
    table.move(t, start, 1)         -- move whole array up to pos
    table.move(t, dst, start, stop) -- move slice [start,stop) to dst
  Overlapping moves are invalid, of course. The cached length is cleared
  or readjusted for this operation (whatever is easier). An extension to
  lapi.c would be preferrable for speed (offering to write a patch if
  anyone thinks this is a good idea).
Bye,
     Mike