[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: [ANN] lua-ConciseSerialization : another pure Lua implementation of CBOR / RFC7049
- From: François Perrad <francois.perrad@...>
- Date: Mon, 5 Dec 2016 17:47:50 +0100
2016-12-03 22:56 GMT+01:00 Sean Conner <sean@conman.org>:
> It was thus said that the Great François Perrad once stated:
>> 2016-12-03 12:44 GMT+01:00 Pierre Chapuis <catwell@archlinux.us>:
>> > December 3, 2016 12:37 AM, "Sean Conner" <sean@conman.org> wrote:
>> >
>> >> I found the API itself interesting. When I wrote my CBOR module [1] I
>> >> check the strings and if they pass as a UTF-8 string, I encode the string as
>> >> TEXT; otherwise it gets encoded as BIN. It never occured to me to have a
>> >> function set the default behavior, and I wonder how your method would work
>> >> with a table of mixed strings and binary data (or is that even enough of a
>> >> concern to conern yourself with? I don't know ... ).
>> >>
>>
>> I don't like the idea : the encoding format depends on the value,
>
> Okay, but I still fail to see how you handle mixed values. For instance
> (using the CBOR diagnostic notation:
>
> {
> 'authkey' : h'...' ,
> 'firmware' : h'...' ,
> 'version' : '1.2.1'
> }
>
> You could argue that you could send everything as a binary string, as it's
> not that much of an issue with Lua, but Python3 makes a very strong
> distinction between text and binary.
>
> Now, because Lua makes it difficult to determine a text string from a
> binary string (since the Lua type for both is 'string') I went down the path
> of checking a string to see if it's valid UTF-8 and if so, encode it as
> text; otherwise binary.
>
> But this really only affects the very simple usage of:
>
>
> cbor = require "org.conman.cbor"
>
> data =
> {
> authkey = filecontents('publickey'),
> firmware = filecontents('bin.v1.2.1'),
> version = '1.2.1',
> }
>
> blob = cbor.encode(data)
>
> If it really bugs you that a binary string might be encoded as text, then
> you can always drop down a level:
>
> blob = cbor.TYPE.MAP(3)
> .. cbor.TYPE.TEXT('authkey') .. cbor.TYPE.BIN(data.authkey)
> .. cbor.TYPE.TEXT('firmware') .. cbor.TYPE.BIN(data.firmware)
> .. cbor.TYPE.TEXT('version') .. cbor.TYPE.TEXT(data.version)
>
> (this also means you can actually control the order of fields in a MAP if
> it's important).
>
Currently, I could write a metamethod which does an equivalent
local c = require 'CBOR'
function mt:tocbor (buffer)
buffer[#buffer+1] = c.MAP(3)
buffer[#buffer+1] = c.text_string('authkey')
buffer[#buffer+1] = c.byte_string(self.authkey)
buffer[#buffer+1] = c.text_string('firmware')
buffer[#buffer+1] = c.byte_string(self.firmware)
buffer[#buffer+1] = c.text_string('version')
buffer[#buffer+1] = c.text_string(self.version)
end
With MessagePack, this problem is less important,
because the difference between STR and BIN cames only with specification v5,
and STR doesn't imply an UTF-8 encoding.
I just add the 3rd option : set_string'check_utf8' which detects UTF8 string.
With this setting, the behaviour becomes the same of your module.
François
>> for example in RFC 7049, section 3.9. Canonical CBOR
>> > If a protocol allows for IEEE floats, then additional
>> > canonicalization rules might need to be added. One example rule
>> > might be to have all floats start as a 64-bit float, then do a test
>> > conversion to a 32-bit float; if the result is the same numeric
>> > value, use the shorter value and repeat the process with a test
>> > conversion to a 16-bit float. (This rule selects 16-bit float for
>> > positive and negative Infinity as well.) Also, there are many
>> > representations for NaN. If NaN is an allowed value, it must always
>> > be represented as 0xf97e00.
>>
>> this kind of behaviour is
>> - slow (because it tries various ways),
>> - not easy to test (because many special cases occur)
>
> Yes, I'll admit it was a bit tricky to write the code to do a minimal
> encoding of floating point values but I don't think it's *that* slow (it
> pulls out the mantissa and exponent and checks to see if the range (for
> exponents) or significance (mantissa) is low enough for a smaller size).
> And again, if you don't want that behavior, you can always step down like
> the above.
>
>> with few functions (set_string, set_number, set_nil, set_array),
>> you could config the encoding at your application level.
>> For example, with an application which is a sensor,
>> you could encode float number always with 16-bits,
>> because you know that this precision is enough.
>
> Point taken. In my case, one could always replace the function
> cbor.__ENCODE_MAP.number() to always encode using a single CBOR type.
>
> -spc
>
>