lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Thu, Jan 16, 2014 at 11:11 AM, Jorge <xxopxe@gmail.com> wrote:
Bencode is a lightweight serialization protocol, with several interesting properties (I found a nice writeup on bencode here: http://jeelabs.org/2012/10/05/whats-the-deal-with-bencode/).

Usually, what you do to work with bencode is to go with https://bitbucket.org/wilhelmy/lua-bencode/ .

But, suppose you have the following scenario:

* You are reading rather big objects in the relation to the total memory you have.
* You are receiving rather long strings, but the reading is made in tiny chunks (say, over a slow serial line).
* You can process said chunks as they arrive, say, compute a CRC or feed it to load()

(Bear with my, this could happen)

In these cases a push-parser could be more adequate. There is a coroutine based example at http://jeelabs.org/2012/10/18/push-scannin-with-coroutines/, but it's blocking.

So I wrote mine own, available here:

https://github.com/xopxe/lua-bencode-push

Internally, it's a tail-call based state machine. It attempts to fiddle as little as possible with strings (no concatenations). It returns string in fragments as they arrive, signalling the end with an empty string. It also pushes the total string length prior to pushing the fragments, if you're interested in such a thing.

It misses any kind of error checks, and no encoder is provided (yet).

Greetings,

Jorge



Cool! The problem of parsing very large inputs is an issue for a lot of cases, so it's nice to see people thinking about it.

Your implementation sounds a lot like LTN12: you have a source function which produces chunks of bencoded data, and a filter function which decodes them (and maybe caches them until a full, decodable chunk arrives).

The way I implemented LTN12 was that a source or filter can return nil to signal the end of the data, or an empty string if no data is available (e.g. for a non-blocking socket which hasn't yet received another packet). I found that simplified the design of some parts. (e.g. functions that expect a string often don't mind if that string is empty, so they don't need to check for it.)

--
Sent from my Game Boy.