lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


2016-12-03 22:56 GMT+01:00 Sean Conner <sean@conman.org>:
> It was thus said that the Great François Perrad once stated:
>> 2016-12-03 12:44 GMT+01:00 Pierre Chapuis <catwell@archlinux.us>:
>> > December 3, 2016 12:37 AM, "Sean Conner" <sean@conman.org> wrote:
>> >
>> >> I found the API itself interesting. When I wrote my CBOR module [1] I
>> >> check the strings and if they pass as a UTF-8 string, I encode the string as
>> >> TEXT; otherwise it gets encoded as BIN. It never occured to me to have a
>> >> function set the default behavior, and I wonder how your method would work
>> >> with a table of mixed strings and binary data (or is that even enough of a
>> >> concern to conern yourself with? I don't know ... ).
>> >>
>>
>> I don't like the idea : the encoding format depends on the value,
>
>   Okay, but I still fail to see how you handle mixed values.  For instance
> (using the CBOR diagnostic notation:
>
>         {
>           'authkey' : h'...' ,
>           'firmware' : h'...' ,
>           'version' : '1.2.1'
>         }
>
>   You could argue that you could send everything as a binary string, as it's
> not that much of an issue with Lua, but Python3 makes a very strong
> distinction between text and binary.
>
>   Now, because Lua makes it difficult to determine a text string from a
> binary string (since the Lua type for both is 'string') I went down the path
> of checking a string to see if it's valid UTF-8 and if so, encode it as
> text; otherwise binary.
>
>   But this really only affects the very simple usage of:
>
>
>         cbor = require "org.conman.cbor"
>
>         data =
>         {
>           authkey = filecontents('publickey'),
>           firmware = filecontents('bin.v1.2.1'),
>           version  = '1.2.1',
>         }
>
>         blob = cbor.encode(data)
>
>   If it really bugs you that a binary string might be encoded as text, then
> you can always drop down a level:
>
>         blob = cbor.TYPE.MAP(3)
>               .. cbor.TYPE.TEXT('authkey')  .. cbor.TYPE.BIN(data.authkey)
>               .. cbor.TYPE.TEXT('firmware') .. cbor.TYPE.BIN(data.firmware)
>               .. cbor.TYPE.TEXT('version')  .. cbor.TYPE.TEXT(data.version)
>
> (this also means you can actually control the order of fields in a MAP if
> it's important).
>

Currently, I could write a metamethod which does an equivalent

local c = require 'CBOR'

function mt:tocbor (buffer)
    buffer[#buffer+1] = c.MAP(3)
    buffer[#buffer+1] = c.text_string('authkey')
    buffer[#buffer+1] = c.byte_string(self.authkey)
    buffer[#buffer+1] = c.text_string('firmware')
    buffer[#buffer+1] = c.byte_string(self.firmware)
    buffer[#buffer+1] = c.text_string('version')
    buffer[#buffer+1] = c.text_string(self.version)
end

With MessagePack, this problem is less important,
because the difference between STR and BIN cames only with specification v5,
and STR doesn't imply an UTF-8 encoding.

I just add the 3rd option : set_string'check_utf8' which detects UTF8 string.
With this setting, the behaviour becomes the same of your module.

François

>> for example in RFC 7049, section 3.9.  Canonical CBOR
>> > If a protocol allows for IEEE floats, then additional
>> > canonicalization rules might need to be added.  One example rule
>> > might be to have all floats start as a 64-bit float, then do a test
>> > conversion to a 32-bit float; if the result is the same numeric
>> > value, use the shorter value and repeat the process with a test
>> > conversion to a 16-bit float.  (This rule selects 16-bit float for
>> > positive and negative Infinity as well.)  Also, there are many
>> > representations for NaN.  If NaN is an allowed value, it must always
>> > be represented as 0xf97e00.
>>
>> this kind of behaviour is
>> - slow (because it tries various ways),
>> - not easy to test (because many special cases occur)
>
>   Yes, I'll admit it was a bit tricky to write the code to do a minimal
> encoding of floating point values but I don't think it's *that* slow (it
> pulls out the mantissa and exponent and checks to see if the range (for
> exponents) or significance (mantissa) is low enough for a smaller size).
> And again, if you don't want that behavior, you can always step down like
> the above.
>
>> with few functions (set_string, set_number, set_nil, set_array),
>> you could config the encoding at your application level.
>> For example, with an application which is a sensor,
>> you could encode float number always with 16-bits,
>> because you know that this precision is enough.
>
>   Point taken.  In my case, one could always replace the function
> cbor.__ENCODE_MAP.number() to always encode using a single CBOR type.
>
>   -spc
>
>