[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Patterns: Why are anchors not character classes?
- From: Dirk Laurie <dirk.laurie@...>
- Date: Wed, 15 Jul 2015 14:14:03 +0200
2015-07-15 11:26 GMT+02:00 John Hind <john.hind@zen.co.uk>:
>
> "%f[set], a frontier pattern; such item matches an empty string at any
> position such that the next character belongs to set and the previous
> character does not belong to set. The set set is interpreted as previously
> described. The beginning and the end of the subject are handled as if they
> were the character '\0'."
>
> Here the beginning and end are not just character classes but are
> (unnecessarily) given an explicit byte encoding breaking the "8-bit clean"
> rule for strings. There would be no need for them to have byte encodings if
> beginning and end were separate character classes.
>
> Having explicit and distinct character classes for "beginning of subject"
> and "end of subject" would regularise and formalise the conceptual framework
> as well as adding practical expressiveness.
Something like this: "%F[bos][set][eos]"?
At some stage, LPEG becomes the proper tool to use, rather than
duplicating its advanced features in the string library.