[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: [MANUAL] Lua string splitting by Python semantics
- From: Dirk Laurie <dirk.laurie@...>
- Date: Mon, 28 Aug 2017 07:45:31 +0200
An old issue on this list [1] was settled in Lua 5.3.3.
Up to Lua 5.3.2, string.gmatch and string.gsub had Perl semantics when
the pattern couild match an empty string; from Lua 5.3.3 onwards, they
have Python semantics.
That is to say, after every match of a non-empty string, there used to
be a match of an empty string; this no longer happens.
Lua 5.3.2 Copyright (C) 1994-2015 Lua.org, PUC-Rio
> for j in string.gmatch(";a;", "a*()") do print(j) end
1
3
3
4
Lua 5.3.3 Copyright (C) 1994-2016 Lua.org, PUC-Rio
> for j in string.gmatch(";a;", "a*()") do print(j) end
1
3
4
Although mentioned in a list of changes [2], the manual itself does
not document the change in behaviour (the old behaviour was not
considered to be a bug).
It can be argued that no change to the manual is needed, since it already says:
| a single character class followed by '*', which matches zero or more
| repetitions of characters in the class. These repetition items will always
| match the longest possible sequence;
Nevertheless, the interpretation that the longest possible sequence
includes the position marker was not previously deemed applicable.
Since one would like to rely on the new behaviour, some documentation
of it, such as an example demonstrating the use of gmatch to iterate
over the fields of a comma-separated list by the pattern "[^,]*",
would be reassuring.
[1] http://lua-users.org/lists/lua-l/2013-04/msg00825.html
[2] http://lua-users.org/lists/lua-l/2016-05/msg00068.html