lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


>>>>> "Coda" == Coda Highland <chighland@gmail.com> writes:

 Coda> Your discovery that it can't be done without loops is also fairly
 Coda> accurate. CSV parsing is one of the classic examples of "you
 Coda> really shouldn't try to do that with a regexp". If it's possible
 Coda> for values to CONTAIN quotes (i.e. by escaping) instead of just
 Coda> being DELIMITED by them, it's actually impossible (unless you use
 Coda> some Perlisms that go beyond the technical formalism of regular
 Coda> expressions).

Nonsense; CSV is clearly a regular language even when allowing quotes
inside the values.

Here is the definition from RFC4180 (excluding the obvious terminals):

  file = [header CRLF] record *(CRLF record) [CRLF]
  header = name *(COMMA name)
  record = field *(COMMA field)
  name = field
  field = (escaped / non-escaped)
  escaped = DQUOTE *(TEXTDATA / COMMA / CR / LF / 2DQUOTE) DQUOTE
  non-escaped = *TEXTDATA

which corresponds to this regexp (assuming newlines match [^] except
where explicitly excluded):

^(("([^"]|"")*"|[^",\r\n]*)(,"([^"]|"")*"|,[^",\r\n]*)*(\r\n|$))*$

 Code> Meanwhile, gsub is LESS expressive than regexps.

Indeed.

-- 
Andrew.