lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Fri, Feb 9, 2018 at 5:08 PM, Russell Haley <russ.haley@gmail.com> wrote:
> Since my match and capture understanding in Lua is somewhat weak I am
> looking for opportunities to improve my understanding. There is a SO
> question about parsing a file here:
>
> https://unix.stackexchange.com/questions/422526/remove-comma-outside-quotes
>
> The crux of the question is to leave the commas within a quoted items
> and replace all the outer "separator" commas with tilde (~). So parse
> this:
>
> 123,"ABC, DEV 23",345,534.202,NAME
>
> and return this:
>
> 123~"ABC, DEV 23"~345~534.202~NAME.
>
> I've put together a couple of pieces but can't find any way to make it
> into an answer without resorting to loops.
>
> s = '123,"ABC, DEV 23",345,534.202,NAME'
>
> print(s:match('".*,.*"'))
> "ABC, DEV 23"
>
> print(s:gsub('".*(,).*"','~'))
> 123,~,345,534.202,NAME  1
>
>
> >From what i can tell, there is no way to add exclusions to patterns so
> at this point I'm stumped. Instead of asking for the solution and
> studying it, I'd like to first ask for a hint from the mailing list.
> My questions are:
>
> - Is it possible to do this in a single call to gsub (I'm hoping yes)?
> If not, I will look first at one or two calls (i.e. match and then
> gsub) and using a loop.
> - Is this something that would be better done with LPEG?
>
> I'll likely ask is people can share possible answers later to see if
> my solution is at all reasonable.
>
> Thanks
>
> Russ

LPEG could do it pretty trivially, yes.

Your discovery that it can't be done without loops is also fairly
accurate. CSV parsing is one of the classic examples of "you really
shouldn't try to do that with a regexp". If it's possible for values
to CONTAIN quotes (i.e. by escaping) instead of just being DELIMITED
by them, it's actually impossible (unless you use some Perlisms that
go beyond the technical formalism of regular expressions). Meanwhile,
gsub is LESS expressive than regexps.

/s/ Adam