[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Patterns: Why are anchors not character classes?
- From: Dirk Laurie <dirk.laurie@...>
- Date: Sat, 18 Jul 2015 21:15:04 +0200
2015-07-18 20:45 GMT+02:00 Rena <hyperhacker@gmail.com>:
> On Jul 18, 2015 1:32 PM, "Dirk Laurie" <dirk.laurie@gmail.com> wrote:
>> Let's look again at what you need these for. It's to force
>> a pattern item that can't match an empty string to match it
>> if necessary when at the beginning or end of the subject.
>> So how about introducing another couple of suffixes?
>> [%b,]< could mean: [%b,]* at the beginning and [%b,]+ elsewhere.
>> [%b,]> could mean: [%b,]* at the end and [%b,]+ elsewhere.
>> This can be implemented in C function `match` with just a few
>> lines.
>>
>
> Well, regardless of whether it's classified as a character class or a set or
> a special case or a fruit or a vegetable, I've many times wished I could
> write a pattern such as "[ ^]%w+[ $]" (match one or more word characters
> bounded by either a space or the start/end of a string). Though ^ already
> has a meaning there...
>
> Of course it's important to decide on an implementation (assuming it's going
> to be implemented at all), but I'm starting to feel like the forest is being
> lost in the trees. The real feature request here is "be able to include
> 'beginning/end of string' in a character set"; exactly how to implement it
> is another question.
The implementation described in my post is indeed trivial.
> line="the quick brown fox jumps over the lazy dog"
> for word in line:gmatch"(%w+)%s+" do print(word) end
the
quick
brown
fox
jumps
over
the
lazy
> for word in line:gmatch"(%w+)%s>" do print(word) end
the
quick
brown
fox
jumps
over
the
lazy
dog
Patch to lstrlib.h (also attached as a file):
5a6,9
> /* Modified by Dirk Laurie 2015-07-18 to include extra suffixes:
> < or > or = means the same as +, except respectively at the start or
> the finish or both ends of the source, where it means the same as *.
> */
484c488,492
< if (*ep == '*' || *ep == '?' || *ep == '-') { /* accept empty? */
---
> if (*ep == '*' || *ep == '?' || *ep == '-' ||
> (*ep == '<' && s == ms->src_init) ||
> (*ep == '>' && s == ms->src_end) ||
> (*ep == '=' && (s == ms->src_init || s == ms->src_end) ) ) {
> /* accept empty? */
500a509
> case '<': case '>': case '=':
5a6,9
> /* Modified by Dirk Laurie 2015-07-18 to include extra suffixes:
> < or > or = means the same as +, except respectively at the start or
> the finish or both ends of the source, where it means the same as *.
> */
484c488,492
< if (*ep == '*' || *ep == '?' || *ep == '-') { /* accept empty? */
---
> if (*ep == '*' || *ep == '?' || *ep == '-' ||
> (*ep == '<' && s == ms->src_init) ||
> (*ep == '>' && s == ms->src_end) ||
> (*ep == '=' && (s == ms->src_init || s == ms->src_end) ) ) {
> /* accept empty? */
500a509
> case '<': case '>': case '=':