[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: gsub bug? 2 results from anchored gsub
- From: "Soni L." <fakedme@...>
- Date: Mon, 27 Jul 2015 13:46:21 -0300
On 27/07/15 12:05 AM, Soni L. wrote:
On 26/07/15 11:40 PM, Daurnimator wrote:
I want to ensure that a string always ends in a single "/".
If it has more than one, the extras should be removed
If it has none, a "/" should be appended.
"/*$" should match all the '/' at the end of the string, and replace
them with a single "/".
I got an unexpected result:
> ("d//"):gsub("/*$", "/")
d// 2
This result suggests that there is an empty string being matched
between the last "/" and the end of the string.
It's matching the // and replacing that with "/"; but then it gets
confused and matches the empty string at the end, and ends up
inserting an extra /
Using 'print' as the match confirms:
> ("d//"):gsub("/*$", print)
//
d// 2
Is this a bug in string.gsub?
It seems odd to me that you could get 2 replacements for an anchored
match.
Though as far as I can see, a strict reading of the manual doesn't
disallow it.
Daurn.
$ doesn't consume the end of the string?
You'll probably find this issue in most pattern matchers?
I've been writing a pattern matcher lately, so let's look at what it'd
do (a bit simplified to be easier to read):
Pattern: /*$ ->
Root[GreedyZeroOrMore["/"], EndOfString]
Matcher:
Cursor position: 0
d//
^
Matched /*, cursor position: 0
d//
^
Doesn't match $. Put char on buffer, increment cursor position and repeat.
Cursor position: 1
d//
^
Matched /*, cursor position: 3
d//
^
Matched $, cursor position: 3
End of pattern, put replacement on buffer (in this case "/"). Repeat.
Cursor position: 3
d//
^
Matched /*, cursor position: 3
d//
^
Matched $, cursor position: 3
End of pattern, put replacement on buffer (in this case "/"). Start
cursor position == end cursor position, so advance cursor.
Cursor position: 4
d//
^
End of string, return buffer.
So you end up with 2 matches and "d//".
--
Disclaimer: these emails are public and can be accessed from <TODO: get a non-DHCP IP and put it here>. If you do not agree with this, DO NOT REPLY.