[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Standard Libraries: (was: and Lua marches on)
- From: David Manura <dm.lua@...>
- Date: Mon, 28 Dec 2009 23:14:14 -0500
On Sun, Dec 27, 2009 at 12:46 PM, steve donovan wrote:
> Here's a first draft of what that split function looks like with a few
> sanity checks:...
> The default separator is spaces, and an empty separator means do no
> splitting (Python regards this as an error condition). The original
> behaves very badly with an empty separator.
Note that in Python, there are two implementations of split, depending
if the separator is a plain string or pattern:
http://docs.python.org/library/re.html#re.split
http://docs.python.org/library/stdtypes.html#str.split
> It's hard to write a split function which meets all our expectations.
> This function ignores delimiters at the ends of the string, which is
> often what we want
>
> split(" one two"," ") => {"one","two"} -- cool
> split(",one,two",",") => {"one","two"} -- not what expected?
>
> Feels that it needs yet another optional parameter, dont_ignore_ends
I think I usually want split(table.concat(t, sep), sep) to be
structurally equal to t, even when t contains empty strings. Example:
parsing an delimited file whose columns may contain empty strings.
However, t={""} and t={} both concatenate to the same string, so we
need to at least assume #t > 0. One rarely has a delimited text file
containing zero columns, and you want always add a dummy column if
this becomes a problem.
Having sep = "" can provide an idiomatic way of converting a character
array to and from a string (as in Perl). However, again,
table.concat(t, "") is not uniquely invertible. There is an argument,
for example, that it is most uniform for split("a", "") to return {"",
"a", ""}. That's what Rici's implementation does if you make sep a
pattern that evaluates to "" (such as ".-"). Perhaps it also goes
with the Lua philosophy noted by Fabien. It may be that the split
function is not the best way in Lua to split a string into individual
characters. s:gmatch'.' may serve that purpose fine.
- References:
- Standard Libraries: (was: and Lua marches on), steve donovan
- Re: Standard Libraries: (was: and Lua marches on), steve donovan
- Re: Standard Libraries: (was: and Lua marches on), steve donovan
- Re: Standard Libraries: (was: and Lua marches on), steve donovan
- Re: Standard Libraries: (was: and Lua marches on), Fabien
- Re: Standard Libraries: (was: and Lua marches on), steve donovan
- Re: Standard Libraries: (was: and Lua marches on), David Manura
- Re: Standard Libraries: (was: and Lua marches on), steve donovan
- Re: Standard Libraries: (was: and Lua marches on), David Manura
- Re: Standard Libraries: (was: and Lua marches on), steve donovan