On Tue, Oct 10, 2017 at 10:06 PM Soni L. <fakedme@gmail.com
<mailto:fakedme@gmail.com>> wrote:
On 2017-10-10 10:54 PM, Martin wrote:
> On 10/10/2017 10:43 PM, Soni L. wrote:
>> However, I noticed something kinda weird:
>>
>> string.match("''", "%f[']'", 2) --> nil
>> string.match("''", "^'", 2) --> '
> string.match("''", "%f[']'", 2). From position 2 of string [['']]
> locate to index <i> such that s[i] ~= [[']] and s[i + 1] ==
[[']] and
> s[i + 1] == [[']].
>
> There is no such index so nil is returned. In case of string
[['' ']]
> there is match.
>
> string.match("''", "^'", 2). From position 2 of string [['']] find
> string [[']]. Do not skip characters (due "^" anchor). This matches
> [[']], second apostrophe.
>
> -- Martin
>
But from position 1, the first pattern matches. From reading the
manual,
both operate on the start of the subject string, so either the first
should match or the second should fail.
Regarding the frontier pattern: it does not match ON a character. It
matches the boundary BETWEEN two characters, specifically the boundary
immediately before the next character to be matched. So when matching
from position 1, the frontier pattern attempts to match the boundary
immediately before the first character in the string. Since there is
no previous character at this boundary, the frontier pattern matches
against "\0" in lieu of the previous character.
"\0" ~= [[']] and [[']] (the first character of the string) == [[']],
so the frontier pattern successfully matches. The literal [[']] then
matches the first character of the string. If you passed this pattern
to string.find instead, it would return a start and end point both
equal to 1, since only the first character was matched. The frontier
pattern does not consume the source string or add anything to the
overall match; it's essentially just an assertion.
Now when you match from position 2, Lua first tries to match the
frontier pattern at the boundary immediately before the character next
to be matched. In this case, that is the second apostrophe, so the
boundary to be checked is between the first and second characters of
the source string. The character previous to that boundary is the
first apostrophe, which is equal to an apostrophe, so the frontier
assertion fails and the function returns nil.
The takeaways here are that frontier patterns check a boundary, not a
character, and specifying an explicit start does not prevent Lua from
checking the character previous to that if the pattern starts with a
frontier assertion.
Now to the second pattern. The "^" character can be a bit surprising
when an explicit start position is specified, because the description
in the manual does not match the actual behavior. The manual states
that the caret "anchors the match at the beginning of the subject
string", but in fact, the caret actually anchors the match at the
starting point (i.e. it prevents the matching engine from skipping
characters).
Thus your second pattern attempts to match [[']] beginning at position
2, without skipping characters, Position 2 is in fact [[']], so the
match succeeds. As a further example, consider string.match("'!'",
"^'", 2); this will fail to match and return nil because the [[']]
cannot match at position 2 and the caret prevents it from skipping
characters. The fact that the first character is [[']] is irrelevant,
since you've overridden that by telling Lua to start at position 2
instead.
The bug in this second issue appears to be in the documentation, not
the code, since there's no good reason why Lua should ignore an
explicit start argument when the caret is given, and valid reasons why
it shouldn't ignore the start argument. Consider a simple tokenizer
that repeatedly matches consecutive tokens without skipping anything
else; the pattern could be prefixed with a ^ and suffixed with a
position capture, and on each iteration, the result from the previous
position capture is fed to the next call as the start argument. In
this case, you would rely on the ^ meaning "anchor to start point",
and a nil result would mean a syntax error in whatever you're tokenizing.