lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Sun, Dec 27, 2009 at 12:46 PM, steve donovan wrote:
> Here's a first draft of what that split function looks like with a few
> sanity checks:...
> The default separator is spaces, and an empty separator means do no
> splitting (Python regards this as an error condition). The original
> behaves very badly with an empty separator.

Note that in Python, there are two implementations of split, depending
if the separator is a plain string or pattern:

  http://docs.python.org/library/re.html#re.split
  http://docs.python.org/library/stdtypes.html#str.split

> It's hard to write a split function which meets all our expectations.
> This function ignores delimiters at the ends of the string, which is
> often what we want
>
> split(" one two"," ") => {"one","two"}   -- cool
> split(",one,two",",") => {"one","two"}   -- not what expected?
>
> Feels that it needs yet another optional parameter, dont_ignore_ends

I think I usually want split(table.concat(t, sep), sep) to be
structurally equal to t, even when t contains empty strings.  Example:
parsing an delimited file whose columns may contain empty strings.
However, t={""} and t={} both concatenate to the same string, so we
need to at least assume #t > 0.  One rarely has a delimited text file
containing zero columns, and you want always add a dummy column if
this becomes a problem.

Having sep = "" can provide an idiomatic way of converting a character
array to and from a string (as in Perl).  However, again,
table.concat(t, "") is not uniquely invertible.  There is an argument,
for example, that it is most uniform for split("a", "") to return {"",
"a", ""}.  That's what Rici's implementation does if you make sep a
pattern that evaluates to "" (such as ".-").  Perhaps it also goes
with the Lua philosophy noted by Fabien.  It may be that the split
function is not the best way in Lua to split a string into individual
characters.  s:gmatch'.' may serve that purpose fine.