[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: [ANN] lua-ConciseSerialization : another pure Lua implementation of CBOR / RFC7049
- From: Sean Conner <sean@...>
- Date: Sat, 3 Dec 2016 16:56:30 -0500
It was thus said that the Great François Perrad once stated:
> 2016-12-03 12:44 GMT+01:00 Pierre Chapuis <catwell@archlinux.us>:
> > December 3, 2016 12:37 AM, "Sean Conner" <sean@conman.org> wrote:
> >
> >> I found the API itself interesting. When I wrote my CBOR module [1] I
> >> check the strings and if they pass as a UTF-8 string, I encode the string as
> >> TEXT; otherwise it gets encoded as BIN. It never occured to me to have a
> >> function set the default behavior, and I wonder how your method would work
> >> with a table of mixed strings and binary data (or is that even enough of a
> >> concern to conern yourself with? I don't know ... ).
> >>
>
> I don't like the idea : the encoding format depends on the value,
Okay, but I still fail to see how you handle mixed values. For instance
(using the CBOR diagnostic notation:
{
'authkey' : h'...' ,
'firmware' : h'...' ,
'version' : '1.2.1'
}
You could argue that you could send everything as a binary string, as it's
not that much of an issue with Lua, but Python3 makes a very strong
distinction between text and binary.
Now, because Lua makes it difficult to determine a text string from a
binary string (since the Lua type for both is 'string') I went down the path
of checking a string to see if it's valid UTF-8 and if so, encode it as
text; otherwise binary.
But this really only affects the very simple usage of:
cbor = require "org.conman.cbor"
data =
{
authkey = filecontents('publickey'),
firmware = filecontents('bin.v1.2.1'),
version = '1.2.1',
}
blob = cbor.encode(data)
If it really bugs you that a binary string might be encoded as text, then
you can always drop down a level:
blob = cbor.TYPE.MAP(3)
.. cbor.TYPE.TEXT('authkey') .. cbor.TYPE.BIN(data.authkey)
.. cbor.TYPE.TEXT('firmware') .. cbor.TYPE.BIN(data.firmware)
.. cbor.TYPE.TEXT('version') .. cbor.TYPE.TEXT(data.version)
(this also means you can actually control the order of fields in a MAP if
it's important).
> for example in RFC 7049, section 3.9. Canonical CBOR
> > If a protocol allows for IEEE floats, then additional
> > canonicalization rules might need to be added. One example rule
> > might be to have all floats start as a 64-bit float, then do a test
> > conversion to a 32-bit float; if the result is the same numeric
> > value, use the shorter value and repeat the process with a test
> > conversion to a 16-bit float. (This rule selects 16-bit float for
> > positive and negative Infinity as well.) Also, there are many
> > representations for NaN. If NaN is an allowed value, it must always
> > be represented as 0xf97e00.
>
> this kind of behaviour is
> - slow (because it tries various ways),
> - not easy to test (because many special cases occur)
Yes, I'll admit it was a bit tricky to write the code to do a minimal
encoding of floating point values but I don't think it's *that* slow (it
pulls out the mantissa and exponent and checks to see if the range (for
exponents) or significance (mantissa) is low enough for a smaller size).
And again, if you don't want that behavior, you can always step down like
the above.
> with few functions (set_string, set_number, set_nil, set_array),
> you could config the encoding at your application level.
> For example, with an application which is a sensor,
> you could encode float number always with 16-bits,
> because you know that this precision is enough.
Point taken. In my case, one could always replace the function
cbor.__ENCODE_MAP.number() to always encode using a single CBOR type.
-spc