lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Thu, Dec 17, 2009 at 2:33 AM, steve donovan wrote:
> E.g., a candidate would be a split() function
> which takes a string and a regular expression and returns a table;
> nearly every non-trivial Lua package has one of these and it is
> actually a more delicate operation than it looks at first.
On Thu, Dec 24, 2009 at 3:08 AM, steve donovan wrote:
> What's interesting is how much interpretation is possible with a
> simple set of functions. Mark H would expect split() to return an
> iterator, Yuri makes it return the unpacked values, etc. The latter is
> easily enough done with string.gmatch, although the former is useful
> enough to be given a new name (like splitv)

The split function obviously belongs, but it is also a function easy
to misimplement.  I don't generally trust the code in [1] as is.
Behavior of corner cases (e.g. empty patterns and treatment of empty
leading and trailing matches) needs to be fully given in the
specifications and confirmed in test cases.  The ECMA JavaScript [2]
and Perl implementations of split provide such specifications.

Take, for example, the split code in [3].  It fails in part because
self:gmatch("()("..pat..")") becomes self:gmatch("()()") when pat is
empty, and "()" takes on a special meaning in Lua.  (Incidentally,
this "exception to the rule", as described by Fabien, occurs here even
in Lua, to our detriment.)

BTW, split functions often provide a "limit" parameter.  This can
control whether something like "a,b,c," splits into {"a", "b,c,"},
{"a", "b", "c"}, or {"a", "b", "c", ""}.

[1] http://lua-users.org/wiki/SplitJoin
[2] http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-262.pdf
[3] http://lua-users.org/lists/lua-l/2006-12/msg00414.html