[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: [ANN] tnetstrings.lua 1.0.0
- From: Florian Weimer <fw@...>
- Date: Sun, 05 Jun 2011 16:03:14 +0200
* Josh Simmons:
>> Your implementation does not round-trip, either.
>
> It seems to, according to the tests.
A table which stems from decoding an array cannot be serialized again.
(You could encode tables whose length is not zero as arrays.)
>>> * Resistant to buffer overflows and other problems.
>>
>> Your implementation results in near-quadratic run time in terms of the
>> input string length.
> As far as I can tell, there's not a huge amount that can be done about
> that. But I'm open to suggestions.
You should avoid string copies, perhaps by changing the signature of
parse() from parse(string) to parse(string, i, j), where parse() only
operates on the substring consisting of the characters [i, j].
Dumping might have a similar issue.
>>> * Fast and low resource intensive.
>>
>> The format requires buffering all data before decoding can start.
>> This means that decoding arbitrary messages requires unbounded
>> storage.
> You can't really get around that while still maintaining "Makes no
> assumptions about string contents" and at any rate it's more a goal to
> be able to predict that than actually handle it.
You could use escaping or some form of chunking for strings, then the
length does not need to be known up-front. It's more important for
tables, though, and there you could work with begin and end markers,
instead of encoding the length of the encoded string.
> It has to be chunked indeed, but very rarely is a true streaming
> system actually implemented. Especially not in the context the
> protocol was designed for, internal communication for a web server
> (mongrel2).
Should request and response bodies be streamed, at least optionally?
Obviously, this is a format issue, and there is nothing you can do
about if you want to implement the format.
One thing about netstrings that bothers me is that they appear to be
human-readable (mainly because ASCII data results in an ASCII string
after encoding, and there is a total redundant end separator), but you
cannot easily change them. Any format which contains explicit lengths
suffers from this problem, and that's why I'm wondering if something
like JSON isn't better overall. The downside is that there can be a
significant overhead for encoding non-random data, and that decoding
is generally slower.