[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: [ANN] tnetstrings.lua 1.0.0
- From: Josh Simmons <simmons.44@...>
- Date: Mon, 6 Jun 2011 00:12:34 +1000
On Mon, Jun 6, 2011 at 12:03 AM, Florian Weimer <firstname.lastname@example.org> wrote:
> * Josh Simmons:
>>> Your implementation does not round-trip, either.
>> It seems to, according to the tests.
> A table which stems from decoding an array cannot be serialized again.
> (You could encode tables whose length is not zero as arrays.)
I thought about having a stab at guessing whether the table is an
array but decided against it. I suppose with the keys must be strings
restriction that would be valid though could lead to human error.
>>>> * Resistant to buffer overflows and other problems.
>>> Your implementation results in near-quadratic run time in terms of the
>>> input string length.
>> As far as I can tell, there's not a huge amount that can be done about
>> that. But I'm open to suggestions.
> You should avoid string copies, perhaps by changing the signature of
> parse() from parse(string) to parse(string, i, j), where parse() only
> operates on the substring consisting of the characters [i, j].
They're not copies, and you can't avoid the work here. Lua's strings
are interned. Might be able to do some FFI magic for LuaJIT, we'll
> Dumping might have a similar issue.
>>>> * Fast and low resource intensive.
>>> The format requires buffering all data before decoding can start.
>>> This means that decoding arbitrary messages requires unbounded
>> You can't really get around that while still maintaining "Makes no
>> assumptions about string contents" and at any rate it's more a goal to
>> be able to predict that than actually handle it.
> You could use escaping or some form of chunking for strings, then the
> length does not need to be known up-front. It's more important for
> tables, though, and there you could work with begin and end markers,
> instead of encoding the length of the encoded string.
Knowing the length up front is a design decision. It was considered
better to be able to throw away silly long bits of data then to be in
a position where you could be trickle fed absolute rubbish. This is
from a DoS perspective.
>> It has to be chunked indeed, but very rarely is a true streaming
>> system actually implemented. Especially not in the context the
>> protocol was designed for, internal communication for a web server
> Should request and response bodies be streamed, at least optionally?
> Obviously, this is a format issue, and there is nothing you can do
> about if you want to implement the format.
> One thing about netstrings that bothers me is that they appear to be
> human-readable (mainly because ASCII data results in an ASCII string
> after encoding, and there is a total redundant end separator), but you
> cannot easily change them. Any format which contains explicit lengths
> suffers from this problem, and that's why I'm wondering if something
> like JSON isn't better overall. The downside is that there can be a
> significant overhead for encoding non-random data, and that decoding
> is generally slower.
JSON is realllly slow to generate (and parse), the problem is you have
to insert things all over the place, wrap things in other things,
search through the data lots, escape everything, ensure valid utf-8.
It was a significant bottleneck in Mongrel2 and that's what led to the
creation of tnetstrings.