lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]



> -----Original Message-----
> From: lua-l-bounces@lists.lua.org [mailto:lua-l-bounces@lists.lua.org] On
> Behalf Of Rena
> Sent: dinsdag 6 oktober 2015 23:09
> To: Lua Mailing List
> Subject: RE: The hypothetical __serialize metamethod (was Re: [ANN] luaproc
> 1.0-4)
> 
> On Oct 6, 2015 4:30 PM, "Thijs Schreijer" <thijs@thijsschreijer.nl> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: lua-l-bounces@lists.lua.org [mailto:lua-l-bounces@lists.lua.org]
> On
> > > Behalf Of Coda Highland
> > > Sent: dinsdag 6 oktober 2015 21:31
> > > To: Lua mailing list
> > > Subject: Re: The hypothetical __serialize metamethod (was Re: [ANN]
> luaproc
> > > 1.0-4)
> > >
> > > On Tue, Oct 6, 2015 at 12:07 PM, Thijs Schreijer
> > > <thijs@thijsschreijer.nl> wrote:
> > > >
> > > >> >
> > > >> >   But wouldn't there have to be a consensus as to *what*
> __serialize
> > > >> > returns?  I mean, obviously, a string of byte values (variation on
> a
> > > >> string)
> > > >> > but the actual contents can vary widely.  Given a simple Lua table:
> > > >> >
> > > >> >         { 1 , "two" , true }
> > > >> >
> > > >> >   One person might want to serialize that as JSON:
> > > >> >
> > > >> >         [ 1 , "two" , true ]
> > > >> >
> > > >> >   Someone else might want BSON [1] (hex dump follows):
> > > >> >
> > > >> >         13 00 00 00 04 00 0D 00
> > > >> >         00 00 30 00 01 00 00 00
> > > >> >         31 00 74 77 6F 32 01
> > > >> >
> > > >> >   Another one (like me) might want to serialize to CBOR [2] (hex
> dump
> > > >> > follows):
> > > >> >
> > > >> >         83 01 63 74 77 6F F5
> > > >> >
> > > >> > and yet another might want straight up Lua:
> > > >> >
> > > >> >         { 1 , "two" , true }
> > > >> >
> > > >> >   Is it better to perhaps just reserve "__serialize" for
> serialization
> > > and
> > > >> > leave it up to modules to flesh it out?  Or do we need to actually
> > > define
> > > >> > the output format?
> > > >> >
> > > >> >   -spc (Not a proposal, just something to talk about ... )
> > > >> >
> > > >> > [1]     http://bsonspec.org/spec.html
> > > >> >
> > > >> > [2]     RFC-7049
> > > >> >
> > > >>
> > > >> Solution: Don't call it __serialize! Call it __json, __bson, __cbor,
> > > >> or whatever's actually appropriate for the format.
> > > >>
> > > >> You could borrow a Pythonism and use __repr for Lua syntax.
> > > >>
> > > >> /s/ Adam
> > > >
> > > > If that were implemented, now how would my code know which of those it
> > > would need to call? Just `__serialize` should do.
> > > >
> > > > It should either return plain Lua values with no recursion (as
> mentioned
> > > earlier), or simply a string. Though the latter option might have
> everybody
> > > reinventing the serialization, whilst some good libraries are available,
> so
> > > I would prefer the former.
> > > >
> > > > A second return value can be added to identify the type. Even if that
> type
> > > might be prone to collisions, a recent remark in the thread about the
> > > usefulness of the registry asked for any cases where there were
> collisions
> > > in the registry. I didn't see any response on that. So I don't think it
> to
> > > be such a big issue.
> > > > Just setting some proper examples with namespacing should set people
> of in
> > > the right direction.
> > > >
> > > > So for the LuaDate library I'm maintaining, something like
> > > "lua:thijsschreijer.nl/luadate/1.0" would be a fine type name I guess.
> > > >
> > > > Thijs
> > >
> > > Your question illustrates exactly why it's relevant: You would only
> > > ask that question if you aren't actually thinking about serialization.
> > > If all you want is just "shove this into a byte array and get it back
> > > out later" then any one of them COULD work, but one thing you
> > > definitely DON'T want to do is to mix metaphors -- and that's exactly
> > > the kind of trouble you'd get in if you aren't specifically asking for
> > > a particular format, because different library maintainers might
> > > integrate with different serialization modules.
> >
> > This is Lua; different library maintainers WILL use different serializers.
> No doubt.
> >
> > So trying to force a single way upon everyone will not work. Hence my two
> step proposal.
> > The __serialize would deliver a non-recursive plain Lua value. And the
> consumer of the module can then apply their own format on top. I like
> `serpent` for the `__repr` format you mentioned, or I might use dkjson if I
> needed your `__json` format.
> >
> > So the `__serialize` method should only simplify the
> application/module/object specific structures. And then the consumer can
> pack it up in any which way the consumer needs it.
> >
> > >
> > > That said: I would imagine you'd probably want to use __repr most
> > > times, since the deserializer would be load(). In this format, types
> > > could be serialized as "setmetatable({}, require('LuaDate').Date)" or
> > > something like that, unless you as the maintainer added a __repr that
> > > would return something more like "require('LuaDate').Date(m,d,y)".
> > > (NB: This is just me throwing something out there as a casual example,
> > > not fully thought out or fleshed out.)
> > >
> > > Meanwhile, someone writing a JSON library would use __json instead,
> > > and anyone wishing to integrate with that library would offer __json.
> > > No conflict, no problem, everyone's happy. (And if you as the LuaDate
> > > maintainer didn't offer __json and someone really wanted to they could
> > > monkey-patch in a __json in their own code, again without stepping on
> > > anyone's toes.)
> >
> > The 2 step approach is generic enough to not require any monkey patching
> at all. Consider using 3 libraries, each using one of `__repr`, `__json` or
> `__bson`. Your way would force me to pick one format, and monkey patch the
> other two.
> > In my proposal, I would simply call `__serialize` on each and throw those
> results at `serpent` or `dkjson` and be done with it. (the performance of
> those two libraries will probably always be better than anything I would
> come up with for a monkey-patch anyway).
> >
> > Thijs
> >
> > >
> > > /s/ Adam
> >
> 
> After reading through this (and related) threads, my personal opinions are:
> 
> 1. __serialize should return a Lua function (with no upvalues) or code
> string to recreate the object. That could include eg loops and control flow

I like the idea of a function, but how would that work without upvalues?

my_metatable.__serialize = function(self)

  -- some stuff here I suppose

  return function()
      local my_data
      -- Here I have nothing to fill my data with... unless I have an upvalue... no?
      return my_data
    end
end

(past midnight here, so probably missing something)


Other than that it closes the option of serializing your data towards anything else but Lua.

> to deal with recursion. (If you want to unserialize untrusted data, you have
> a different problem that probably is best handled by an application-specific
> system.) You're free to dump() the resulting function to a file and load()
> it later.
> 
> 2. A __serialize method shouldn't necessarily be a "standard" metamethod
> (with a particular signature and operation described in the Lua manual), but
> if someone decides to implement it in their objects, it'd be helpful if it
> followed a simple interface (such as described above).
> 
> 3. A __tojson method might be nice too, and serializers could certainly use
> it.

Can of worms... because all other possible formats "would be nice too" as well.

> 
> 4. My phone keyboard needs to smarten up.
> 
> Notice I'm not using the P-word here :) just throwing ideas at the wall to
> see what sticks.


Thijs