[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Patterns: Why are anchors not character classes?
- From: "John Hind" <john.hind@...>
- Date: Thu, 16 Jul 2015 11:04:00 +0100
Wed, 15 Jul 2015 14:14:03 +0200 Dirk Laurie <dirk.laurie@gmail.com>:
>> Having explicit and distinct character classes for "beginning of subject"
>> and "end of subject" would regularise and formalise the conceptual
>> framework as well as adding practical expressiveness.
>Something like this: "%F[bos][set][eos]"?
>At some stage, LPEG becomes the proper tool to use, rather than duplicating
its advanced features in the string library.
I suggest adding '%^' and '%$' as character classes. Deprecation of the
existing '^' and '$' as anchors is a policy decision, they would be
redundant, but might be retained for backward compatibility. New character
classes are unlikely to break existing code.
Normally the frontier pattern will work as before since we just want
"beginning of subject" and (more usually) "end of subject" to be treated as
'characters' which are not in the set. This way we can still use '%z' in a
frontier set to represent an actual character '\0' without the current risk
of confusing it with the "as if" character '\0' at the end of subject. But
we can also use '%^' and '%$' explicitly in any set definition (including
but not limited to the frontier pattern). So, for example, a parameter list
can be patterned with:
"%s*(%w+)%s*[,%$]"
I think this is both conceptually and in implementation simpler and cleaner
than the current definition, not an "advanced feature" at all.
---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus