lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Oct 6, 2015 4:30 PM, "Thijs Schreijer" <thijs@thijsschreijer.nl> wrote:
>
>
>
> > -----Original Message-----
> > From: lua-l-bounces@lists.lua.org [mailto:lua-l-bounces@lists.lua.org] On
> > Behalf Of Coda Highland
> > Sent: dinsdag 6 oktober 2015 21:31
> > To: Lua mailing list
> > Subject: Re: The hypothetical __serialize metamethod (was Re: [ANN] luaproc
> > 1.0-4)
> >
> > On Tue, Oct 6, 2015 at 12:07 PM, Thijs Schreijer
> > <thijs@thijsschreijer.nl> wrote:
> > >
> > >> >
> > >> >   But wouldn't there have to be a consensus as to *what* __serialize
> > >> > returns?  I mean, obviously, a string of byte values (variation on a
> > >> string)
> > >> > but the actual contents can vary widely.  Given a simple Lua table:
> > >> >
> > >> >         { 1 , "two" , true }
> > >> >
> > >> >   One person might want to serialize that as JSON:
> > >> >
> > >> >         [ 1 , "two" , true ]
> > >> >
> > >> >   Someone else might want BSON [1] (hex dump follows):
> > >> >
> > >> >         13 00 00 00 04 00 0D 00
> > >> >         00 00 30 00 01 00 00 00
> > >> >         31 00 74 77 6F 32 01
> > >> >
> > >> >   Another one (like me) might want to serialize to CBOR [2] (hex dump
> > >> > follows):
> > >> >
> > >> >         83 01 63 74 77 6F F5
> > >> >
> > >> > and yet another might want straight up Lua:
> > >> >
> > >> >         { 1 , "two" , true }
> > >> >
> > >> >   Is it better to perhaps just reserve "__serialize" for serialization
> > and
> > >> > leave it up to modules to flesh it out?  Or do we need to actually
> > define
> > >> > the output format?
> > >> >
> > >> >   -spc (Not a proposal, just something to talk about ... )
> > >> >
> > >> > [1]     http://bsonspec.org/spec.html
> > >> >
> > >> > [2]     RFC-7049
> > >> >
> > >>
> > >> Solution: Don't call it __serialize! Call it __json, __bson, __cbor,
> > >> or whatever's actually appropriate for the format.
> > >>
> > >> You could borrow a Pythonism and use __repr for Lua syntax.
> > >>
> > >> /s/ Adam
> > >
> > > If that were implemented, now how would my code know which of those it
> > would need to call? Just `__serialize` should do.
> > >
> > > It should either return plain Lua values with no recursion (as mentioned
> > earlier), or simply a string. Though the latter option might have everybody
> > reinventing the serialization, whilst some good libraries are available, so
> > I would prefer the former.
> > >
> > > A second return value can be added to identify the type. Even if that type
> > might be prone to collisions, a recent remark in the thread about the
> > usefulness of the registry asked for any cases where there were collisions
> > in the registry. I didn't see any response on that. So I don't think it to
> > be such a big issue.
> > > Just setting some proper examples with namespacing should set people of in
> > the right direction.
> > >
> > > So for the LuaDate library I'm maintaining, something like
> > "lua:thijsschreijer.nl/luadate/1.0" would be a fine type name I guess.
> > >
> > > Thijs
> >
> > Your question illustrates exactly why it's relevant: You would only
> > ask that question if you aren't actually thinking about serialization.
> > If all you want is just "shove this into a byte array and get it back
> > out later" then any one of them COULD work, but one thing you
> > definitely DON'T want to do is to mix metaphors -- and that's exactly
> > the kind of trouble you'd get in if you aren't specifically asking for
> > a particular format, because different library maintainers might
> > integrate with different serialization modules.
>
> This is Lua; different library maintainers WILL use different serializers. No doubt.
>
> So trying to force a single way upon everyone will not work. Hence my two step proposal.
> The __serialize would deliver a non-recursive plain Lua value. And the consumer of the module can then apply their own format on top. I like `serpent` for the `__repr` format you mentioned, or I might use dkjson if I needed your `__json` format.
>
> So the `__serialize` method should only simplify the application/module/object specific structures. And then the consumer can pack it up in any which way the consumer needs it.
>
> >
> > That said: I would imagine you'd probably want to use __repr most
> > times, since the deserializer would be load(). In this format, types
> > could be serialized as "setmetatable({}, require('LuaDate').Date)" or
> > something like that, unless you as the maintainer added a __repr that
> > would return something more like "require('LuaDate').Date(m,d,y)".
> > (NB: This is just me throwing something out there as a casual example,
> > not fully thought out or fleshed out.)
> >
> > Meanwhile, someone writing a JSON library would use __json instead,
> > and anyone wishing to integrate with that library would offer __json.
> > No conflict, no problem, everyone's happy. (And if you as the LuaDate
> > maintainer didn't offer __json and someone really wanted to they could
> > monkey-patch in a __json in their own code, again without stepping on
> > anyone's toes.)
>
> The 2 step approach is generic enough to not require any monkey patching at all. Consider using 3 libraries, each using one of `__repr`, `__json` or `__bson`. Your way would force me to pick one format, and monkey patch the other two.
> In my proposal, I would simply call `__serialize` on each and throw those results at `serpent` or `dkjson` and be done with it. (the performance of those two libraries will probably always be better than anything I would come up with for a monkey-patch anyway).
>
> Thijs
>
> >
> > /s/ Adam
>

After reading through this (and related) threads, my personal opinions are:

1. __serialize should return a Lua function (with no upvalues) or code string to recreate the object. That could include eg loops and control flow to deal with recursion. (If you want to unserialize untrusted data, you have a different problem that probably is best handled by an application-specific system.) You're free to dump() the resulting function to a file and load() it later.

2. A __serialize method shouldn't necessarily be a "standard" metamethod (with a particular signature and operation described in the Lua manual), but if someone decides to implement it in their objects, it'd be helpful if it followed a simple interface (such as described above).

3. A __tojson method might be nice too, and serializers could certainly use it.

4. My phone keyboard needs to smarten up.

Notice I'm not using the P-word here :) just throwing ideas at the wall to see what sticks.