lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


> Indeed, the trivial little state machine has ALWAYS been my preferred
way to solve this class of problem.

One case it doesn't handle is the mix of ' and " in the string (as the
string should only be closed with the same quote is was opened with,
assuming both " and ' are allowed).

This code should handle this case (also added examples):

local list = {
  '123,"ABC, DEV 23",345,,202,NAME',
  ',"ABC, DEV 23",345,534,202,NAME',
  '123,"ABC, DEV 23",345,534.202,',
  '123 , "ABC, DEV 23" , 345 , 534 , 202 , NAME',
  [[123,"ABC, \"he,llo\", DEV 23",,,,NAME]],
  [[123,"ABC, \\",llo\", DEV 23,,,,NAME]],
  [[123,'ABC, "hel,lo", DEV 23',342,534,202,NAME]],
  [[123,"ABC, 'hel,lo, DEV 23",342,534,202,NAME]],
}
local escaped, instring, pos, start = false, false, -1, ""
for _,s in ipairs(list) do
  s = s:gsub([[()(["',\\])]], function(p, s)
      if p > pos+1 then escaped = false end
      if s == '\\' then escaped, pos = not escaped, p end
      if (s == '"' or s == "'") and not escaped
      and (not instring or s == start) then instring, start = not
instring, s end
      if s == ',' and not instring then s = '~' end
      return s
    end)
  print(s)
end

Paul.

On Fri, Feb 9, 2018 at 5:02 PM, Coda Highland <chighland@gmail.com> wrote:
> On Fri, Feb 9, 2018 at 5:40 PM, Paul K <paul@zerobrane.com> wrote:
>> Hi Russ,
>>
>>> The crux of the question is to leave the commas within a quoted items
>>> and replace all the outer "separator" commas with tilde (~). So parse this:
>>> 123,"ABC, DEV 23",345,534.202,NAME
>>> and return this:
>>> 123~"ABC, DEV 23"~345~534.202~NAME.
>>
>> Something like the following may work:
>>
>> local list = {
>>   '123,"ABC, DEV 23",345,,202,NAME',
>>   ',"ABC, DEV 23",345,534,202,NAME',
>>   '123,"ABC, DEV 23",345,534.202,',
>>   '123 , "ABC, DEV 23" , 345 , 534 , 202 , NAME',
>>   [[123,"ABC, \"he,llo\", DEV 23",,,,NAME]],
>>   [[123,"ABC, \\",llo\", DEV 23,,,,NAME]],
>>   [[123,'ABC, "hello", DEV 23',342,534,202,NAME]],
>> }
>> local escaped, instring, pos = false, false, -1
>> for _,s in ipairs(list) do
>>   s = s:gsub([[()(["',\\])]], function(p, s)
>>       if p > pos+1 then escaped = false end
>>       if s == '\\' then escaped, pos = not escaped, p end
>>       if (s == '"' or s == "'") and not escaped then instring = not instring end
>>       if s == ',' and not instring then s = '~' end
>>       return s
>>     end)
>>   print(s)
>> end
>>
>> This prints:
>>
>> 123~"ABC, DEV 23"~345~~202~NAME
>> ~"ABC, DEV 23"~345~534~202~NAME
>> 123~"ABC, DEV 23"~345~534.202~
>> 123 ~ "ABC, DEV 23" ~ 345 ~ 534 ~ 202 ~ NAME
>> 123~"ABC, \"he,llo\", DEV 23"~~~~NAME
>> 123~"ABC, \\"~llo\"~ DEV 23~~~~NAME
>> 123~'ABC, "hello", DEV 23'~342~534~202~NAME
>>
>> Paul.
>
> Indeed, the trivial little state machine has ALWAYS been my preferred
> way to solve this class of problem.
>
> /s/ Adam
>