lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


[This topic is not Lua-specific, but has been talked on this list several times, probably because there is no such method built in Lua.]

Hello,

I recently had to write string splitting methods for personal use, and stepped on some common design issues; especially about empty sections: should we keep them or instead filter them out? It seems to me 2 distinct use cases happen.
There may be 2 methods: s:fields(seps) & s:items(seps). They respectively match the use cases of splitting a string holding record fields (1) vs a string holding a sequence of items:

    id,name,phone,email = s:fields('--')
    s = "321--Bob--123456789--bob@bob.org"  --> "321","Bob","123456789","bob@bob.org"
    s = "--Bob----bob@bob.org"              --> "","Bob","","bob@bob.org"

    names = s:items({' ','\t'})
    s = "foo bar\tbaz"                      --> {"foo","bar","baz"}                     
    s = " foo   bar \t baz  "               --> {"foo","bar","baz"}

* fields keeps all sections, items filters empty ones out. (2)
* fields unpacks its return values, items returns an array. (3)
* Both should allow as argument a set of separators. (4)
* Neither trims sections by default; there may be an optional "trim" flag (5).
* Both may have ' ' as single default separator.

Does this distinction make sense for you?
Is the pair of methods correctly designed?
Is it worth be added to Lua's builtins?

Example code available (plain Lua).

(1) More precisely: some data analog to a record of unnamed fields, id est rather a kind of tuple.
(2) I was surprised to realise that empty sections at either end (string starts or ends with a separator) and inside the string (string holds consecutive separators) should be adressed the same way.
(3) "items" may instead be called "array", or "sequence", or "list", to highlight this difference.
Side question: Is there a reason for Lua not unpacking "a,b,c = expr" when expr evaluates to an array? This would let the user the choice of unpacking or not (record = s:fields(seps)).
(4) Provided as an array (otherwise we cannot have any multichar separator). An alternative would be to use '...' as seps, but this prevents having additional parameters such as "trim".
(5) "trim" would not be really needed if Lua had builtin "map" for arrays and "trim" for strings: things = table.map(things, string.trim).

Denis
________________________________

la vita e estrany

http://spir.wikidot.com/