lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Wed, Jun 6, 2012 at 3:41 AM, Paul K <paulclinger@yahoo.com> wrote:
> Yes, yet another serializer and pretty printer [...]
> http://notebook.kulchenko.com/programming/serpent-lua-serializer-pretty-printer.

Dumping is a common requirement yet I've been uncomfortable using any
of the off-the-shelf dumping routines on the web.  Your module holds
some promise though (at least it has tests, documentation, design
goals, license, and versioning and works nearly as one would expect
it.)

> My requirements for a serializer: (1) pure Lua [...], (2) does
> both pretty printing and robust serialization, (3) handles shared and
> self-references, (4) serializes keys of various types, including
> tables as keys, and (5) is short and doesn't have too many
> dependencies to be included with another module.

The output looks reasonable, in general.  Some comments:

(1) Dumping _G would be a good test of this.  The output of
serpent.printmult(_G) is attached.  As seen, the main downside is that
some packages are expanded deeply inside _G.package.loaded rather than
_G.  Dumping tables at the lowest possible nesting level would be
ideal though may add some complexity.

(2) A "--[[table: 0x9aa7988]]" style comment is appended after each
table, even when the table is non-recursive.  A possible concern is
that these addresses are not deterministic, so dumps of structurally
equivalent tables in different program executions will give different
results.  This makes textual comparisons (diffs) of structures
difficult and causes large diffs when serializations are maintained
under revision control.  For readability, you may wish to replace
addresses (32- or 64-bit) with an integer counter that gives
deterministic dumps on structural equality, though insertions could
still cause massive renumbering.  Some time ago I was working on a
dumper like Perl Data::Dumper that would dump `x = {}; return
{a={b=x},c={d=x}}` as "{a={b={}}, c={d=T.a.b}}", where "T" refers to
the top-level table.  Here, T could even have a metatable such that
T.a.b evaluates to a node that is expanded on a subsequent walk
through the table.

(3) The keys are not currently being sorted.  Sorting would improve
readability and deterministic output but may affect your performance
numbers.

(4) In the attached dump of _G, the string.gmatch is curiously set to
"nil --[[ref]]".  It took me some time before realizing this was
because the deprecated string.gfind is an alias to string.gmatch, and
for gfind it outputs "= string.gmatch --[[function: 0x9b3ab28]]".
Currently, in Lua 5.1, string.gmatch would serialize as string.gfind,
which will be absent upon unserializing with for a VM's compiled with
LUA_COMPAT_GFIND undefined.  Perhaps [[ref]] should have some more
information.  It would also be ideal if it could prefer non-deprecated
functions, but that requires special cases.

(5) For pretty printing, "1\n2" would be nicer than "1\0102".  \n is
common, but beginning users might not recognize \0102.

(6) `package.loadlib` dumping shows an `--[[err]]`, which looks
worrisome as presented as an error not warning.  This occurs because
it's not in globals.  Other cases include package.loaders[i].

(7) The naming of "serialize", "printmult", and "printsing" was
somewhat jarring to me (think: "print multiple objects" and "prints
ing"/"print sing (song)").  I thought the latter would internally
invoke `print` when first running it -- not that I wanted it to
because it's not general.  Also, single- and multi-line forms of
serialization and pretty printing would suggest 2 x 2 = 4 combinations
for orthogonality.  In my recent lua-mbuild, for example (in which I
would consider adding your library), I currently use my own trivial
dumper to serialize a (non-recursive) data structure to disk in a
multi-line format (to allow easier debugging), and I'd be tempted to
call printmult even though this is serialization not pretty printing
as the name implies.  Finally, "The library provides three functions
-- serialize, printmult, and printsing -- with the last two being
shortcuts for the main serialize function." is not strictly correct:
All three are wrappers around a "local function serialize" that is not
exposed, and, as is, it's not possible to pass through a nil name in
the public version of `serialize`.

(8) The coding style is a little too compact for my taste, concerning
readability.  Usually I prefer to define variables on separate lines,
as opposed to

  local n, v, c, d = "serpent", 0.1, -- (C) 2012 Paul Kulchenko; MIT License
    "Paul Kulchenko", "Serialization and pretty printing of Lua data types"
  local keyword, globals, G = {}, {}, (_G or _ENV)
  local ttype, level = type(t), (level or 0)

The above single character top-level variable names aren't so great
either, but you might just inline those values into the table at the
bottom.  Some may also object to

  local function safestr(s) return type(s) == "number" and
(snum[tostring(s)] or s)
    or type(s) ~= "string" and tostring(s) -- escape NEWLINE/010 and EOF/026
    or ("%q"):format(s):gsub("\010","010"):gsub("\026","\\026") end

(9) Another interesting test is `_ENV = loadstring(require
"serpent".serialize(_G))(); <test suite code>`.  It won't work for all
cases, but it does still run less than trivial things like life.lua.

(10) Dumps of bytecode are probably not useful for pretty printing
unless perhaps you were to decompile the bytecode.  More useful info
may be found in debug info (including sometimes a file name with
source code), in cases debug.* is even permitted.   Some such things,
however, are probably outside the scope of a simple dumping module.
Bytecode may still have limited uses though in serialization (transfer
code between Lua states?).

(11) One area I've used dumpers is to dump AST's (dump.lua in
LuaInspect).  In cases like {tag=Op, '+', ...}, I'd prefer the named
part *before* the positional part, particularly the 'tag' name at the
very front.  This might be outside the scope of this module though.

(12) In "nil values are included when expected ({1, nil, 3} instead of
{1, [3]=3})", I'm not sure what "expected" means given the
undefinedness of # with holes.  I'd actually expect the latter format
deterministically for sparse arrays.

(13) Maybe use 1/0 rather than math.huge (after all you already rely
on 0/0) to avoid simple serializations of numerical arrays causing
lookups into `math`, which won't exist under empty _ENV's sandboxes.
{
  string = {
    sub = string.sub --[[function: 0x9b3aca0]],
    upper = string.upper --[[function: 0x9b3acd0]],
    len = string.len --[[function: 0x9b3ab98]],
    gfind = string.gmatch --[[function: 0x9b3ab28]],
    rep = string.rep --[[function: 0x9b3ac38]],
    find = string.find --[[function: 0x9b3aa80]],
    match = string.match --[[function: 0x9b3ac00]],
    char = string.char --[[function: 0x9b38618]],
    dump = string.dump --[[function: 0x9b38650]],
    gmatch = nil --[[ref]],
    reverse = string.reverse --[[function: 0x9b3ac68]],
    byte = string.byte --[[function: 0x9b385e0]],
    format = string.format --[[function: 0x9b3aab8]],
    gsub = string.gsub --[[function: 0x9b3ab60]],
    lower = string.lower --[[function: 0x9b3abc8]]
  } --[[table: 0x9b385b8]],
  xpcall = xpcall --[[function: 0x9b37fb0]],
  package = {
    preload = {} --[[table: 0x9b38ed8]],
    loadlib = "function: 0x9b38b30" --[[err]],
    loaded = {
      string = nil --[[ref]],
      debug = {
        getupvalue = debug.getupvalue --[[function: 0x9b3bb90]],
        debug = debug.debug --[[function: 0x9b39db0]],
        sethook = debug.sethook --[[function: 0x9b3bbe0]],
        getmetatable = debug.getmetatable --[[function: 0x9b3bb78]],
        gethook = debug.gethook --[[function: 0x9b39de0]],
        setmetatable = debug.setmetatable --[[function: 0x9b3bc50]],
        setlocal = debug.setlocal --[[function: 0x9b3bc18]],
        traceback = debug.traceback --[[function: 0x9b3bca0]],
        setfenv = debug.setfenv --[[function: 0x9b3bbc8]],
        getinfo = debug.getinfo --[[function: 0x9b39e18]],
        setupvalue = debug.setupvalue --[[function: 0x9b3bc68]],
        getlocal = debug.getlocal --[[function: 0x9b39e50]],
        getregistry = debug.getregistry --[[function: 0x9b3bb40]],
        getfenv = debug.getfenv --[[function: 0x9b39dc8]]
      } --[[table: 0x9b3b788]],
      package = nil --[[ref]],
      _G = nil --[[ref]],
      serpent = {
        printsing = loadstring("LuaQ\000\000\000\000\000@src/serpent.lua\000K\000\000\000K\000\000\000\000\000\000\000D\000\000\000€\000\000\000]\000\000^\000\000\000\000€\000\000\000\000\000\000\000\000\000\000\000\000K\000\000\000K\000\000\000K\000\000\000K\000\000\000K\000\000\000\000\000\000\000\000\000t\000\000\000\000\000\000\000\000\000\000\000\010\000\000\000serialize\000",'@serialized') --[[function: 0x9b43958]],
        serialize = loadstring("LuaQ\000\000\000\000\000@src/serpent.lua\000I\000\000\000I\000\000\000\000	\010\000\000\000\000\000@\000\000?€\000\000\000€?\000\000?\000\000€€\000\000\000€\000\000\000\000\000\000\000_\000\000\000\000\000\010\000\000\000I\000\000\000I\000\000\000I\000\000\000I\000\000\000I\000\000\000I\000\000\000I\000\000\000I\000\000\000I\000\000\000I\000\000\000\000\000\000\000\000\000t\000\000\000\000\000	\000\000\000\000\000\000n\000\000\000\000\000	\000\000\000\000\000\000i\000\000\000\000\000	\000\000\000\000\000\000f\000\000\000\000\000	\000\000\000\000\000\000\010\000\000\000serialize\000",'@serialized') --[[function: 0x9b438f8]],
        _NAME = "serpent",
        _COPYRIGHT = "Paul Kulchenko",
        printmult = loadstring("LuaQ\000\000\000\000\000@src/serpent.lua\000J\000\000\000J\000\000\000\000	\000\000\000?000\000\000?000\000\000\000[A€\000\000\000€A\000\000?000\000?000\000\000\000€\000\000\000\000\000\000\000  \000\000\000\000\000	\000\000\000J\000\000\000J\000\000\000J\000\000\000J\000\000\000J\000\000\000J\000\000\000J\000\000\000J\000\000\000J\000\000\000\000\000\000\000\000\000t\000\000\000\000\000\000\000\000\000\000\000i\000\000\000\000\000\000\000\000\000\000\000\010\000\000\000serialize\000",'@serialized') --[[function: 0x9b43938]],
        _DESCRIPTION = "Serialization and pretty printing of Lua data types",
        _VERSION = 0.1
      } --[[table: 0x9b437e8]],
      io = {
        lines = io.lines --[[function: 0x9b39e98]],
        write = io.write --[[function: 0x9b39fc0]],
        close = io.close --[[function: 0x9b38ab8]],
        flush = io.flush --[[function: 0x9b38ad0]],
        open = io.open --[[function: 0x9b39eb0]],
        output = io.output --[[function: 0x9b39ee8]],
        type = io.type --[[function: 0x9b39fa8]],
        read = io.read --[[function: 0x9b39f58]],
        stderr = io.stderr --[[file (0x28a460)]],
        stdin = io.stdin --[[file (0x28a5a0)]],
        input = io.input --[[function: 0x9b38ae8]],
        stdout = io.stdout --[[file (0x28a500)]],
        popen = io.popen --[[function: 0x9b39f20]],
        tmpfile = io.tmpfile --[[function: 0x9b39f70]]
      } --[[table: 0x9b399a0]],
      os = {
        exit = os.exit --[[function: 0x9b3a478]],
        setlocale = os.setlocale --[[function: 0x9b384d8]],
        date = os.date --[[function: 0x9b3a3d0]],
        getenv = os.getenv --[[function: 0x9b374e0]],
        difftime = os.difftime --[[function: 0x9b3a408]],
        remove = os.remove --[[function: 0x9b38488]],
        time = os.time --[[function: 0x9b38510]],
        clock = os.clock --[[function: 0x9b3a398]],
        tmpname = os.tmpname --[[function: 0x9b38548]],
        rename = os.rename --[[function: 0x9b384a0]],
        execute = os.execute --[[function: 0x9b3a440]]
      } --[[table: 0x9b3a1a8]],
      table = {
        setn = table.setn --[[function: 0x9b38438]],
        insert = table.insert --[[function: 0x9b383c8]],
        getn = table.getn --[[function: 0x9b38358]],
        foreachi = table.foreachi --[[function: 0x9b38320]],
        maxn = table.maxn --[[function: 0x9b38390]],
        foreach = table.foreach --[[function: 0x9b382e8]],
        concat = table.concat --[[function: 0x9b38850]],
        sort = table.sort --[[function: 0x9b38470]],
        remove = table.remove --[[function: 0x9b38400]]
      } --[[table: 0x9b38828]],
      math = {
        log = math.log --[[function: 0x9b3b460]],
        max = math.max --[[function: 0x9b3b490]],
        acos = math.acos --[[function: 0x9b3b168]],
        huge = math.huge,
        ldexp = math.ldexp --[[function: 0x9b3b3f0]],
        pi = 3.1415926535898,
        cos = math.cos --[[function: 0x9b3b2b8]],
        tanh = math.tanh --[[function: 0x9b3b698]],
        pow = math.pow --[[function: 0x9b3b528]],
        deg = math.deg --[[function: 0x9b3b2e8]],
        tan = math.tan --[[function: 0x9b3b6d0]],
        cosh = math.cosh --[[function: 0x9b3b280]],
        sinh = math.sinh --[[function: 0x9b3b5f8]],
        random = math.random --[[function: 0x9b3b588]],
        randomseed = math.randomseed --[[function: 0x9b3b5c0]],
        frexp = math.frexp --[[function: 0x9b3b3b8]],
        ceil = math.ceil --[[function: 0x9b3b248]],
        floor = math.floor --[[function: 0x9b3b348]],
        rad = math.rad --[[function: 0x9b3b558]],
        abs = math.abs --[[function: 0x9b3b138]],
        sqrt = math.sqrt --[[function: 0x9b3b660]],
        modf = math.modf --[[function: 0x9b3b4f0]],
        asin = math.asin --[[function: 0x9b3b1a0]],
        min = math.min --[[function: 0x9b3b4c0]],
        mod = math.fmod --[[function: 0x9b3b380]],
        fmod = nil --[[ref]],
        log10 = math.log10 --[[function: 0x9b3b428]],
        atan2 = math.atan2 --[[function: 0x9b3b1d8]],
        exp = math.exp --[[function: 0x9b3b318]],
        sin = math.sin --[[function: 0x9b3b630]],
        atan = math.atan --[[function: 0x9b3b210]]
      } --[[table: 0x9b3ad88]],
      coroutine = {
        resume = coroutine.resume --[[function: 0x9b38880]],
        yield = coroutine.yield --[[function: 0x9b38960]],
        status = coroutine.status --[[function: 0x9b388f0]],
        wrap = coroutine.wrap --[[function: 0x9b38928]],
        create = coroutine.create --[[function: 0x9b38868]],
        running = coroutine.running --[[function: 0x9b388b8]]
      } --[[table: 0x9b38718]]
    } --[[table: 0x9b37b50]],
    loaders = {
      "function: 0x9b38c00" --[[err]],
      "function: 0x9b38c18" --[[err]],
      "function: 0x9b38c30" --[[err]],
      "function: 0x9b38c48" --[[err]]
    } --[[table: 0x9b38ba0]],
    cpath = "./?.so;/usr/local/lib/lua/5.1/?.so;/usr/lib/i386-linux-gnu/lua/5.1/?.so;/usr/lib/lua/5.1/?.so;/usr/local/lib/lua/5.1/loadall.so",
    config = "/\010;\010?\010!\010-",
    path = "src/?.lua",
    seeall = "function: 0x9b38b68" --[[err]]
  } --[[table: 0x9b38a50]],
  tostring = tostring --[[function: 0x9b37f08]],
  print = print --[[function: 0x9b38020]],
  os = nil --[[ref]],
  unpack = unpack --[[function: 0x9b37f78]],
  require = require --[[function: 0x9b38f40]],
  getfenv = getfenv --[[function: 0x9b37c48]],
  setmetatable = setmetatable --[[function: 0x9b37e90]],
  next = next --[[function: 0x9b37d68]],
  assert = assert --[[function: 0x9b37b98]],
  tonumber = tonumber --[[function: 0x9b37ed0]],
  io = nil --[[ref]],
  rawequal = rawequal --[[function: 0x9b38058]],
  collectgarbage = collectgarbage --[[function: 0x9b37bd0]],
  getmetatable = getmetatable --[[function: 0x9b37de0]],
  module = module --[[function: 0x9b38ce0]],
  rawset = rawset --[[function: 0x9b380c8]],
  math = nil --[[ref]],
  debug = nil --[[ref]],
  pcall = pcall --[[function: 0x9b37da0]],
  table = nil --[[ref]],
  newproxy = newproxy --[[function: 0x9b386b0]],
  type = type --[[function: 0x9b37f40]],
  coroutine = nil --[[ref]],
  _G = nil --[[ref]],
  select = select --[[function: 0x9b37478]],
  gcinfo = gcinfo --[[function: 0x9b37c10]],
  pairs = pairs --[[function: 0x9b37a00]],
  rawget = rawget --[[function: 0x9b38090]],
  loadstring = loadstring --[[function: 0x9b37d30]],
  ipairs = ipairs --[[function: 0x9b379a0]],
  _VERSION = "Lua 5.1",
  dofile = dofile --[[function: 0x9b37c88]],
  setfenv = setfenv --[[function: 0x9b37e58]],
  load = load --[[function: 0x9b37cf8]],
  error = error --[[function: 0x9b37cc0]],
  loadfile = loadfile --[[function: 0x9b37e20]]
} --[[table: 0x9b37450]]