[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Replace specific comma's in a string.
- From: albertmcchan <albertmcchan@...>
- Date: Sat, 10 Feb 2018 06:17:06 -0500
On Feb 10, 2018, at 3:56 AM, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
>>>>>> "Coda" == Coda Highland <chighland@gmail.com> writes:
>
> Coda> Your discovery that it can't be done without loops is also fairly
> Coda> accurate. CSV parsing is one of the classic examples of "you
> Coda> really shouldn't try to do that with a regexp". If it's possible
> Coda> for values to CONTAIN quotes (i.e. by escaping) instead of just
> Coda> being DELIMITED by them, it's actually impossible (unless you use
> Coda> some Perlisms that go beyond the technical formalism of regular
> Coda> expressions).
>
> Nonsense; CSV is clearly a regular language even when allowing quotes
> inside the values.
>
> Here is the definition from RFC4180 (excluding the obvious terminals):
>
> file = [header CRLF] record *(CRLF record) [CRLF]
> header = name *(COMMA name)
> record = field *(COMMA field)
> name = field
> field = (escaped / non-escaped)
> escaped = DQUOTE *(TEXTDATA / COMMA / CR / LF / 2DQUOTE) DQUOTE
> non-escaped = *TEXTDATA
>
> which corresponds to this regexp (assuming newlines match [^] except
> where explicitly excluded):
>
> ^(("([^"]|"")*"|[^",\r\n]*)(,"([^"]|"")*"|,[^",\r\n]*)*(\r\n|$))*$
>
> Code> Meanwhile, gsub is LESS expressive than regexps.
>
> Indeed.
>
> --
> Andrew.
just curious, what will lpeg re pattern look like ?