lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Nothing in the description of `gsub` prepares the user for this:

> print((string.gsub(";a;", "a*", "ITEM")))
ITEM;ITEMITEM;ITEM

A similar thing happens with `gmatch`:

> I={}; for s in (";a;"):gmatch"a*" do I[#I+1]=s end
> print(table.concat(I,'|'))
|a||

This behaviour has been reported on the Wiki[1]. Unfortunately the fix
given there, namely "[^,]+", is not a fix. Extensive experimentation
has failed to reveal any way of fiddling with the pattern that produces
`ITEM;ITEM;ITEM` and `|a|` respectively.

Is it conceivable that this behaviour is actually a bug in `gmatch`
and `gsub`?

Let's compare it to the well-known Unix utility `sed`.

…/src$ echo ";a;" | sed -e "s/a*/ITEM/g"
ITEM;ITEM;ITEM

Well, well!

`sed` seems to follow the rule "An empty match is rejected when it is
adjacent to the previous match." Whereas `lstrlib.c` follows the rule
"After an empty match, the following match must start at a later
position", which is equivalent to "An empty match is rejected when it
is adjacent to the previous match and that match was also empty."

Let's say there is no bug in `gmatch` and `gsub`; the choice of rule
is an implementation detail.

Still, if it is an implementation detail, it can be changed without
changing the language. That possibility is demonstrated in the attached
version of `lstrlib.c`.

[1] <http://lua-users.org/wiki/SplitJoin> Method: Using only string.gsub