lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Feb 10, 2018, at 3:56 AM, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:

>>>>>> "Coda" == Coda Highland <chighland@gmail.com> writes:
> 
> Coda> Your discovery that it can't be done without loops is also fairly
> Coda> accurate. CSV parsing is one of the classic examples of "you
> Coda> really shouldn't try to do that with a regexp". If it's possible
> Coda> for values to CONTAIN quotes (i.e. by escaping) instead of just
> Coda> being DELIMITED by them, it's actually impossible (unless you use
> Coda> some Perlisms that go beyond the technical formalism of regular
> Coda> expressions).
> 
> Nonsense; CSV is clearly a regular language even when allowing quotes
> inside the values.
> 
> Here is the definition from RFC4180 (excluding the obvious terminals):
> 
>  file = [header CRLF] record *(CRLF record) [CRLF]
>  header = name *(COMMA name)
>  record = field *(COMMA field)
>  name = field
>  field = (escaped / non-escaped)
>  escaped = DQUOTE *(TEXTDATA / COMMA / CR / LF / 2DQUOTE) DQUOTE
>  non-escaped = *TEXTDATA
> 
> which corresponds to this regexp (assuming newlines match [^] except
> where explicitly excluded):
> 
> ^(("([^"]|"")*"|[^",\r\n]*)(,"([^"]|"")*"|,[^",\r\n]*)*(\r\n|$))*$
> 
> Code> Meanwhile, gsub is LESS expressive than regexps.
> 
> Indeed.
> 
> -- 
> Andrew.

just curious, what will lpeg re pattern look like ?