lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

On 19-Dec-06, at 11:44 AM, Andrew Wilson wrote:

Typo, in second timing loop the time function should use strsplit.

          time(splitstr, ("a"):rep(i), ",")     should be      time(strsplit, ("a"):rep(i), ",")

Quite right, sorry.

By the way, great code, how many verson of this function can this list produce?

OK, here's another one. This one works even if the pattern has captures; on each iteration except the last it returns:
  <segment>, <captures>...
(if there are no captures, it returns the full separator)

The last segment is returned as a single value, making it easy to identify the last segment in a loop. After a certain amount of experimentation, I'm pretty convinced that this is the best interface for 'split', at list with my coding style.

The function, now only 11 lines:

function string:split(pat)
  local st, g = 1, self:gmatch("()("..pat..")")
  local function getter(self, segs, seps, sep, cap1, ...)
    st = sep and seps + #sep
    return self:sub(segs, (seps or 0) - 1), cap1 or sep, ...
  local function splitter(self)
    if st then return getter(self, st, g()) end
  return splitter, self

As an example of how this might be used for a slightly non-trivial
splitting problem, consider the problem parsing IRC protocol lines,
which consist of some number of whitespace-separated words,
possibly ending with an argument whose first character is ':'
and which extends to the end of the line (if I remember all
the details correctly).

Here's an implementation using the above split interface:

function ircsplit(cmd)
  local t = {}
  for word, colon, start in cmd:split"%s+(:?)()" do
    t[#t+1] = word
    if colon == ":" then
      t[#t+1] = cmd:sub(start)
  return t

The pattern captures the leading colon, if there is any, as
well as the string index of the character following the
separator. The loop body uses this information to terminate
the loop.