lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


2015-07-18 20:45 GMT+02:00 Rena <hyperhacker@gmail.com>:
> On Jul 18, 2015 1:32 PM, "Dirk Laurie" <dirk.laurie@gmail.com> wrote:
>> Let's look again at what you need these for. It's to force
>> a pattern item that can't match an empty string to match it
>> if necessary when at the beginning or end of the subject.
>> So how about introducing another couple of suffixes?
>> [%b,]< could mean: [%b,]* at the beginning and [%b,]+ elsewhere.
>> [%b,]> could mean: [%b,]* at the end and [%b,]+ elsewhere.
>> This can be implemented in C function `match` with just a few
>> lines.
>>
>
> Well, regardless of whether it's classified as a character class or a set or
> a special case or a fruit or a vegetable, I've many times wished I could
> write a pattern such as "[ ^]%w+[ $]" (match one or more word characters
> bounded by either a space or the start/end of a string). Though ^ already
> has a meaning there...
>
> Of course it's important to decide on an implementation (assuming it's going
> to be implemented at all), but I'm starting to feel like the forest is being
> lost in the trees. The real feature request here is "be able to include
> 'beginning/end of string' in a character set"; exactly how to implement it
> is another question.

The implementation described in my post is indeed trivial.

> line="the quick brown fox jumps over the lazy dog"
> for word in line:gmatch"(%w+)%s+" do print(word) end
the
quick
brown
fox
jumps
over
the
lazy
> for word in line:gmatch"(%w+)%s>" do print(word) end
the
quick
brown
fox
jumps
over
the
lazy
dog

Patch to lstrlib.h (also attached as a file):

5a6,9
> /* Modified by Dirk Laurie 2015-07-18 to include extra suffixes:
>    < or > or = means the same as +, except respectively at the start or
>    the finish or both ends of the source, where it means the same as *.
> */
484c488,492
<           if (*ep == '*' || *ep == '?' || *ep == '-') {  /* accept empty? */
---
>           if (*ep == '*' || *ep == '?' || *ep == '-' ||
>              (*ep == '<' && s == ms->src_init) ||
>              (*ep == '>' && s == ms->src_end) ||
>              (*ep == '=' && (s == ms->src_init || s == ms->src_end) ) ) {
>               /* accept empty? */
500a509
>             case '<': case '>': case '=':
5a6,9
> /* Modified by Dirk Laurie 2015-07-18 to include extra suffixes:
>    < or > or = means the same as +, except respectively at the start or
>    the finish or both ends of the source, where it means the same as *.
> */
484c488,492
<           if (*ep == '*' || *ep == '?' || *ep == '-') {  /* accept empty? */
---
>           if (*ep == '*' || *ep == '?' || *ep == '-' || 
>              (*ep == '<' && s == ms->src_init) ||
>              (*ep == '>' && s == ms->src_end) ||
>              (*ep == '=' && (s == ms->src_init || s == ms->src_end) ) ) {  
>               /* accept empty? */
500a509
>             case '<': case '>': case '=':