lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]



On Sun, Apr 16, 2017 at 11:47 PM, Dirk Laurie <dirk.laurie@gmail.com> wrote:
My primitive JSON decoder, which operates by lexically translating
a JSON code to a Lua table literal, now does three things:

1. Unicode codepoints translated, e.g. from \u00e9 to \u{00e9}.
2. List delimiters translated from […] to {…}.
3. Keys translated e.g. from "item": to ["item"]=.

local function json_decode (s)
  s = s:gsub("\\u(%d%d%d%d)","\\u{%1}")
 
\\u is followed by 4 HEX digits, not decimal ones.

Using gsub to replace \u(%x%x%x%x) with \u{%1} does not ignore a double backslash.  \\u1234 is not a representation of unicode character 0x1234, but of the string 'backslash, lower case u, digit 1, etc'.
Note \\\u1234 DOES denote a backslash followed by a unicode char. Lua's basic regex implementation is a bit too restricted to handle the pattern "odd number of backslashes followed by 'u'".

In my experience it's easier in the long run to build a state machine for the lexical analysis and a recursive descent parser for the recursive structure than trying to handle all the special cases using increasingly clever hacks.