I've written the attached. Any comments that will lead to the last sentence in it becoming less painful will be greatly appreciated.
LPEG is a library for operating on patterns. It is not an alternative to the Lua string library, but it can be used to implement libraries rather like the string library instead of doing it directly in C.
Some words mean very specific things to LPEG.
A Lua string, consisting of an arbitrary sequence of bytes. On many systems, one could say "characters" instead.
A userdata type, containing enough information to characterize a certain property that a string might have. The properties that can be described are rather like the questions solvers of crossword puzzles tend to ask. For example: "seven letters, starts with 'p', has a double letter in it somewere".
A string that is presented for examination to an LPEG function. Its bytes are referred to as
Not the same as
The central function of LPEG.
A pattern succeeds when it matches a substring of the subject at the point where it is applied.
A pattern fails when it does not match any substring at that point.
Individually accessible portions of a match.
A pattern consumes its match if that portion of the subject is not available to any follow-up pattern except a backspace pattern.
The most spectacular feature of LPEG is the way complicated patterns are built up from simpler ones: Patterns can be used instead of numbers as the values that the variables in an arithmetic _expression_ may take. For example, "x^2+3*x-13" is a perfectly valid LPEG _expression_ when
x is a pattern.
Roberto Ierusalimschy put a lot of thought into allocating useful meanings to the arithmetic operations and constants of Lua. In particular, the priority of operations is such that one often needs no parentheses. Some words of warning, though:
truein a pattern _expression_, they cannot be combined with each other, only with existing patterns. To make sure, convert them to patterns first by applying
lpeg.is used only here, it will be just
This goes without saying, but I am nevertheless saying it.
true stands for a pattern that always succeeds,
false for a pattern that always fails.
A string stands for a pattern that matches only that exact string.
0 stands for a pattern that matches the empty string, positive
n for a pattern that matches exactly
-n for the negation of
n (see below).
-p succeeds when
p fails. It consumes no input. The idiom
-1 matches only the end of the subject.
q respectively match
a..b. Note that multiplication is not commutative.
p+q matches what
p matches, except when
p fails; then it matches what
q matches. Note that
p+q succeeds if and only if
q+p succeeds, but if both
q would succeed, the match is that of the first pattern. So addition is not quite commutative.
p-q fails if
q succeeds, otherwise matches what
p matches. Note that
0-p does the same as
p-p does not do the same as
0, it does the same as
p/s matches what
p does, but processes the captures of
p as specified by
s. There are many variations, for example if
p itself contains no captures,
p/1 creates a capture consisting of the substring matched by
Not quite exponentiation in the usual sense:
p*p means exactly two repetitions of
p, which is not the same as
nor more repetitions of
p^0matches any number of repetitions of
p, including the empty string.
p^-nmatches not more than
#p matches what
p matches, but consumes no input. A common idiom:
#p*q matches what
q matches, provided that
For example, suppose
x^2+3*x-13 means "two or more copies of
abc, or any three bytes followed by
abc, but not 13 or more bytes long".
lpeg. has been omitted here.
Apart from nil, boolean, string and number, discussed above, functions and tables can also be converted to patterns, but these are too advanced to discuss here. Existing patterns are unchanged.
r is a two-byte string, matches any byte whose internal numerical code is in the range
r:byte(1,2). You could use characters in
r, e.g. R"az" on most systems matches the range of lowercase letters, but see
locale for a more portable alternative.
S"()" matches any of
lpeg.match is the name of a function that for a given pattern and subject determines whether there is a substring
subject:sub(init,stop-1) matched by the pattern. It returns the value
nil if the pattern does not match.
locale() returns a table of patterns that match character classes. Recommended method is to examine the keys of the returned table, e.g.
locale().lower matches all lower-case letters.
When called with
p as first argument, these functions will replace
P(p) before proceeding. When
p is already a pattern, hey can also be called the object-oriented way shown below.
Tries to match
init defaults to 1. Returns the captures of the match, or if none specified, the index of the first byte in the subject after the match.
p does, but matches just before the current position in the subject instead of at it and consumes no input.
p is restricted to patterns of fixed length that make no captures.
p does, and returns a capture of the match.
There are several specialized constructors and methods dealing with captures, many of which involve variations of the division operator.
P acting on tables and functions.
The LPEG distribution also contains
re.lua, an application demonstrating the feasibility of writing a regular _expression_ handler using LPEG.
The present author is not yet qualified to write about these.
About this document: Dirk Laurie wrote it in order to teach himself LPEG. All errors in it can be blamed on his inexperience.