lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Tue, Oct 6, 2015 at 6:10 PM, Thijs Schreijer <thijs@thijsschreijer.nl> wrote:
>
>
>> -----Original Message-----
>> From: lua-l-bounces@lists.lua.org [mailto:lua-l-bounces@lists.lua.org] On
>> Behalf Of Rena
>> Sent: dinsdag 6 oktober 2015 23:09
>> To: Lua Mailing List
>> Subject: RE: The hypothetical __serialize metamethod (was Re: [ANN] luaproc
>> 1.0-4)
>>
>> On Oct 6, 2015 4:30 PM, "Thijs Schreijer" <thijs@thijsschreijer.nl> wrote:
>> >
>> >
>> >
>> > > -----Original Message-----
>> > > From: lua-l-bounces@lists.lua.org [mailto:lua-l-bounces@lists.lua.org]
>> On
>> > > Behalf Of Coda Highland
>> > > Sent: dinsdag 6 oktober 2015 21:31
>> > > To: Lua mailing list
>> > > Subject: Re: The hypothetical __serialize metamethod (was Re: [ANN]
>> luaproc
>> > > 1.0-4)
>> > >
>> > > On Tue, Oct 6, 2015 at 12:07 PM, Thijs Schreijer
>> > > <thijs@thijsschreijer.nl> wrote:
>> > > >
>> > > >> >
>> > > >> >   But wouldn't there have to be a consensus as to *what*
>> __serialize
>> > > >> > returns?  I mean, obviously, a string of byte values (variation on
>> a
>> > > >> string)
>> > > >> > but the actual contents can vary widely.  Given a simple Lua table:
>> > > >> >
>> > > >> >         { 1 , "two" , true }
>> > > >> >
>> > > >> >   One person might want to serialize that as JSON:
>> > > >> >
>> > > >> >         [ 1 , "two" , true ]
>> > > >> >
>> > > >> >   Someone else might want BSON [1] (hex dump follows):
>> > > >> >
>> > > >> >         13 00 00 00 04 00 0D 00
>> > > >> >         00 00 30 00 01 00 00 00
>> > > >> >         31 00 74 77 6F 32 01
>> > > >> >
>> > > >> >   Another one (like me) might want to serialize to CBOR [2] (hex
>> dump
>> > > >> > follows):
>> > > >> >
>> > > >> >         83 01 63 74 77 6F F5
>> > > >> >
>> > > >> > and yet another might want straight up Lua:
>> > > >> >
>> > > >> >         { 1 , "two" , true }
>> > > >> >
>> > > >> >   Is it better to perhaps just reserve "__serialize" for
>> serialization
>> > > and
>> > > >> > leave it up to modules to flesh it out?  Or do we need to actually
>> > > define
>> > > >> > the output format?
>> > > >> >
>> > > >> >   -spc (Not a proposal, just something to talk about ... )
>> > > >> >
>> > > >> > [1]     http://bsonspec.org/spec.html
>> > > >> >
>> > > >> > [2]     RFC-7049
>> > > >> >
>> > > >>
>> > > >> Solution: Don't call it __serialize! Call it __json, __bson, __cbor,
>> > > >> or whatever's actually appropriate for the format.
>> > > >>
>> > > >> You could borrow a Pythonism and use __repr for Lua syntax.
>> > > >>
>> > > >> /s/ Adam
>> > > >
>> > > > If that were implemented, now how would my code know which of those it
>> > > would need to call? Just `__serialize` should do.
>> > > >
>> > > > It should either return plain Lua values with no recursion (as
>> mentioned
>> > > earlier), or simply a string. Though the latter option might have
>> everybody
>> > > reinventing the serialization, whilst some good libraries are available,
>> so
>> > > I would prefer the former.
>> > > >
>> > > > A second return value can be added to identify the type. Even if that
>> type
>> > > might be prone to collisions, a recent remark in the thread about the
>> > > usefulness of the registry asked for any cases where there were
>> collisions
>> > > in the registry. I didn't see any response on that. So I don't think it
>> to
>> > > be such a big issue.
>> > > > Just setting some proper examples with namespacing should set people
>> of in
>> > > the right direction.
>> > > >
>> > > > So for the LuaDate library I'm maintaining, something like
>> > > "lua:thijsschreijer.nl/luadate/1.0" would be a fine type name I guess.
>> > > >
>> > > > Thijs
>> > >
>> > > Your question illustrates exactly why it's relevant: You would only
>> > > ask that question if you aren't actually thinking about serialization.
>> > > If all you want is just "shove this into a byte array and get it back
>> > > out later" then any one of them COULD work, but one thing you
>> > > definitely DON'T want to do is to mix metaphors -- and that's exactly
>> > > the kind of trouble you'd get in if you aren't specifically asking for
>> > > a particular format, because different library maintainers might
>> > > integrate with different serialization modules.
>> >
>> > This is Lua; different library maintainers WILL use different serializers.
>> No doubt.
>> >
>> > So trying to force a single way upon everyone will not work. Hence my two
>> step proposal.
>> > The __serialize would deliver a non-recursive plain Lua value. And the
>> consumer of the module can then apply their own format on top. I like
>> `serpent` for the `__repr` format you mentioned, or I might use dkjson if I
>> needed your `__json` format.
>> >
>> > So the `__serialize` method should only simplify the
>> application/module/object specific structures. And then the consumer can
>> pack it up in any which way the consumer needs it.
>> >
>> > >
>> > > That said: I would imagine you'd probably want to use __repr most
>> > > times, since the deserializer would be load(). In this format, types
>> > > could be serialized as "setmetatable({}, require('LuaDate').Date)" or
>> > > something like that, unless you as the maintainer added a __repr that
>> > > would return something more like "require('LuaDate').Date(m,d,y)".
>> > > (NB: This is just me throwing something out there as a casual example,
>> > > not fully thought out or fleshed out.)
>> > >
>> > > Meanwhile, someone writing a JSON library would use __json instead,
>> > > and anyone wishing to integrate with that library would offer __json.
>> > > No conflict, no problem, everyone's happy. (And if you as the LuaDate
>> > > maintainer didn't offer __json and someone really wanted to they could
>> > > monkey-patch in a __json in their own code, again without stepping on
>> > > anyone's toes.)
>> >
>> > The 2 step approach is generic enough to not require any monkey patching
>> at all. Consider using 3 libraries, each using one of `__repr`, `__json` or
>> `__bson`. Your way would force me to pick one format, and monkey patch the
>> other two.
>> > In my proposal, I would simply call `__serialize` on each and throw those
>> results at `serpent` or `dkjson` and be done with it. (the performance of
>> those two libraries will probably always be better than anything I would
>> come up with for a monkey-patch anyway).
>> >
>> > Thijs
>> >
>> > >
>> > > /s/ Adam
>> >
>>
>> After reading through this (and related) threads, my personal opinions are:
>>
>> 1. __serialize should return a Lua function (with no upvalues) or code
>> string to recreate the object. That could include eg loops and control flow
>
> I like the idea of a function, but how would that work without upvalues?
>
> my_metatable.__serialize = function(self)
>
>   -- some stuff here I suppose
>
>   return function()
>       local my_data
>       -- Here I have nothing to fill my data with... unless I have an upvalue... no?
>       return my_data
>     end
> end
>
> (past midnight here, so probably missing something)
>
>
> Other than that it closes the option of serializing your data towards anything else but Lua.

Good point. I was thinking along the lines of:

Vector3.metatable.__serialize = function(self)
    return function()
        return Vector3.new(self.x, self.y, self.z)
    end
end

but that does still have 'self' (and potentially 'Vector3') as an upvalue. Hmm.

>> 3. A __tojson method might be nice too, and serializers could certainly use
>> it.
>
> Can of worms... because all other possible formats "would be nice too" as well.

Of course, but there's no need to implement them all. Just if you do,
and call it something standard like __tojson, a serializer could use
it.

-- 
Sent from my Game Boy.