lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


It was thus said that the Great François Perrad once stated:
> 2016-12-03 12:44 GMT+01:00 Pierre Chapuis <catwell@archlinux.us>:
> > December 3, 2016 12:37 AM, "Sean Conner" <sean@conman.org> wrote:
> >
> >> I found the API itself interesting. When I wrote my CBOR module [1] I
> >> check the strings and if they pass as a UTF-8 string, I encode the string as
> >> TEXT; otherwise it gets encoded as BIN. It never occured to me to have a
> >> function set the default behavior, and I wonder how your method would work
> >> with a table of mixed strings and binary data (or is that even enough of a
> >> concern to conern yourself with? I don't know ... ).
> >>
> 
> I don't like the idea : the encoding format depends on the value,

  Okay, but I still fail to see how you handle mixed values.  For instance
(using the CBOR diagnostic notation:

	{ 
	  'authkey' : h'...' ,
	  'firmware' : h'...' ,
	  'version' : '1.2.1'
	}

  You could argue that you could send everything as a binary string, as it's
not that much of an issue with Lua, but Python3 makes a very strong
distinction between text and binary.

  Now, because Lua makes it difficult to determine a text string from a
binary string (since the Lua type for both is 'string') I went down the path
of checking a string to see if it's valid UTF-8 and if so, encode it as
text; otherwise binary.

  But this really only affects the very simple usage of:


	cbor = require "org.conman.cbor"

	data = 
	{
	  authkey = filecontents('publickey'),
	  firmware = filecontents('bin.v1.2.1'),
	  version  = '1.2.1',
	}

	blob = cbor.encode(data)

  If it really bugs you that a binary string might be encoded as text, then
you can always drop down a level:

	blob = cbor.TYPE.MAP(3)
	      .. cbor.TYPE.TEXT('authkey')  .. cbor.TYPE.BIN(data.authkey)
	      .. cbor.TYPE.TEXT('firmware') .. cbor.TYPE.BIN(data.firmware)
	      .. cbor.TYPE.TEXT('version')  .. cbor.TYPE.TEXT(data.version)

(this also means you can actually control the order of fields in a MAP if
it's important).

> for example in RFC 7049, section 3.9.  Canonical CBOR
> > If a protocol allows for IEEE floats, then additional
> > canonicalization rules might need to be added.  One example rule
> > might be to have all floats start as a 64-bit float, then do a test
> > conversion to a 32-bit float; if the result is the same numeric
> > value, use the shorter value and repeat the process with a test
> > conversion to a 16-bit float.  (This rule selects 16-bit float for
> > positive and negative Infinity as well.)  Also, there are many
> > representations for NaN.  If NaN is an allowed value, it must always
> > be represented as 0xf97e00.
> 
> this kind of behaviour is
> - slow (because it tries various ways),
> - not easy to test (because many special cases occur)

  Yes, I'll admit it was a bit tricky to write the code to do a minimal
encoding of floating point values but I don't think it's *that* slow (it
pulls out the mantissa and exponent and checks to see if the range (for
exponents) or significance (mantissa) is low enough for a smaller size). 
And again, if you don't want that behavior, you can always step down like
the above.

> with few functions (set_string, set_number, set_nil, set_array),
> you could config the encoding at your application level.
> For example, with an application which is a sensor,
> you could encode float number always with 16-bits,
> because you know that this precision is enough.

  Point taken.  In my case, one could always replace the function
cbor.__ENCODE_MAP.number() to always encode using a single CBOR type.

  -spc