lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


you need to combine (with an "and" clause) the match of at least one character, **before** using the inverse to that combination.

Note that the inverse operation can apply to an LPeg pattern matching strings of arbitrary length (including zero length). So its inverse would match nothing at all if the non-inverted pattern can match zero-length strings.

The good question is then: what is the minimum string length that your "inverse" pattern must match: it is not so evident, if the non-inverted pattern must match "character clusters" (notably canonically equivalent sequences, consider "é", it can be encoded as a single Unicode code point in NFC form, or two in NFD form. As well the encoding plays a role (one byte in ISO 8859-1, two or four bytes in UTF-8, four or eight bytes in UTF-32), so this also depends on which datatype you used to represent a single "character" (i.e. the effective lengths that can be matched by the "." pattern):

Lua strings are normally opaque to the encoding, and are just vectors or arbitrary bytes. But it is no longer opaque when it is configured to handle character types (e.g. digits, letters,...) and character ranges (this depends on the encoding as well, even for simple ranges like "[a-z]" if you think about EBCDIC-like encodings: that's why they are preconfiguration for common sets (letters, digits, lowercase, uppercase, symbols, punctuations, whitespaces), that should first be defined (including what the "." pattern will match, word boundary conditions, newlines...) and encoding rules for ranges: all this influences how patterns will effectively be working, becaues internally Lua will work on matching byte values (from 0 to 255), not "characters" which have variable metrics (in terms of encodings, equivalences, byte-length, and relative ordering for ranges).


Le mer. 9 mars 2022 à 16:17, Scott Morgan <blumf@blueyonder.co.uk> a écrit :
I'm starting out with LPeg but struggling with inverse matching on a set.

As an example, trying to match a substring without ';' or ',' chars

  patt = (-lpeg.S";,") ^ 1

Fails with "loop body may accept empty string"

Plain -lpeg.S";," does the right thing for the first char, so why
doesn't ^ 1 extend that for longer strings?

Scott