[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: LPeg Question
- From: Jim Whitehead II <jnwhiteh@...>
- Date: Thu, 2 Apr 2009 13:01:43 +0100
I'm working on a grammar in LPeg, but I'm running into an issue where
I need to match whichever is the longest of two patterns, but I
suspect I have a problem that I'm not seeing and I'm not sure how to
fix the issue. I would like to match lines of text of the following
format:
mydomain.com: some message text here
jnwhiteh@mydomain.com: some message text here
The : is just a leading character and the text following can be either
a hostname or a user@hostname. I'm using the following definitions:
LETTER = lpeg.R"az", "AZ"
DIGIT = lpeg.R"09"
SPECIAL = lpeg.S";[]\\`_^{|}!"
PERIOD = lpeg.P"."
triple = DIGIT * DIGIT^-2
hostaddr = triple * PERIOD * triple * PERIOD * triple * PERIOD * triple
shortname = (LETTER + DIGIT) * (LETTER + DIGIT + P"-")^0 * (LETTER + DIGIT)^0
hostname = shortname * (PERIOD * shortname)^0
host = hostname + hostaddr
user = LETTER * (LETTER + DIGIT + SPECIAL)^-8
userathost = user + P"@" + host
source = host + userathost
params = P" :" * (LETTER + DIGIT + SPECIAL + P" ")^-1
line = P":" * source * params
The problems I see are the following:
* The string "mydomain.com" is matched by userathost as well as host
* The string "jnwhiteh@mydomain.com" is partially matched by host and
fully matched by userathost
The second problem causes me the most amount of problem, since the
partial match is selected by the alternation and then the pattern
fails as a whole (if I read it correctly). Adding an end pattern
P(-1) doesn't seem to help. The pattern as it stands will catch the
following:
mydomain.com: Hello World
but not this:
jnwhiteh@mydomain.com: Hello World
It's entirely possible that I'm just looking at the problem in the
wrong way due to my lack of familiarity with LPeg. If anyone can shed
some light, I'd appreciate it!
- Jim