lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Hi Jorge,

On Thu, Jan 16, 2014 at 14:11:11 -0200, Jorge wrote:
> Bencode is a lightweight serialization protocol, with several 
> interesting properties (I found a nice writeup on bencode here: 
> http://jeelabs.org/2012/10/05/whats-the-deal-with-bencode/).
> 
> Usually, what you do to work with bencode is to go with 
> https://bitbucket.org/wilhelmy/lua-bencode/ .

(For anybody else reading this, I'm the maintainer of the latter. I got
the decoder from krka on #lua on Freenode, and a day later kludged
together an encoder and basically shipped that as version 1, which some
people might actually still be using. Thought I should probably mention
that since I'm expecting that nobody here knows me very well.)

> But, suppose you have the following scenario:
> 
> * You are reading rather big objects in the relation to the total memory 
> you have.
> * You are receiving rather long strings, but the reading is made in tiny 
> chunks (say, over a slow serial line).
> * You can process said chunks as they arrive, say, compute a CRC or feed 
> it to load()
> 
> (Bear with my, this could happen)

Indeed.

I had actually been meaning to write an event based parser (like SAX,
but then again, not exactly like SAX. If you've ever had to deal with
it, you'll know what I mean :) for bencode, but rejected the idea,
knowing that it probably wouldn't be worth the effort for me since I
didn't have any plans for it and no users ever contacted me (which might
have been good and an indicator that something just works™, but I think
it's more likely that nobody actually uses it for anything serious. Then
again, there have been quite a few downloads, especially for the latest
release. :)

> In these cases a push-parser could be more adequate. There is a 
> coroutine based example at 
> http://jeelabs.org/2012/10/18/push-scannin-with-coroutines/, but it's 
> blocking.

Alright, I see yours isn't actually using coroutines at all. (And
neither is mine, it's just a dumb parser that keeps the entire decoded
table in memory. The encoder, if anything, is probably even dumber, so
no need for coroutines in general.)

> So I wrote mine own, available here:
> 
> https://github.com/xopxe/lua-bencode-push
> 
> Internally, it's a tail-call based state machine. It attempts to fiddle 
> as little as possible with strings (no concatenations). It returns 
> string in fragments as they arrive, signalling the end with an empty 
> string. It also pushes the total string length prior to pushing the 
> fragments, if you're interested in such a thing.
> 
> It misses any kind of error checks, and no encoder is provided (yet).

If you want, you can take over maintainership over lua-bencode, and/or
we can even merge both projects since I don't really do anything with it
anyway, and you already fixed all of the I had with the actual encoding.
(And besides, having just one library instead of two with functions for
both "DOM" and "SAX" parsing would make sense)

As far as I'm concerned, I think a nice to have for your module would be

* Not having the callbacks at the module scope, instead using a
  constructor for every time you need to call the module with different
  set of callbacks.

* passing a trace table (actually a stack) to the module so that
  callbacks don't necessarily have to track where in the bencode
  document they currently are. In that case, my guess would be that you
  can also omit the "level" parameter and just use #stack in case you
  really need the level (which I think is rather unlikely), and have the
  position pushed to the stack before each call to a callback, and pop
  down the current position afterwards again. This IMHO avoids the
  problem of having to track the state yourself, which SAX parsers have
  (at least as far as I know), but it would obviously consume more
  memory because a table of the path would have to be kept around (which
  would still be nowhere as big as keeping the entire document or
  decoded table around, the way lua-bencode does it).

Then again, I think I might be missing something, as I haven't read the
entire code either, just quickly skimmed through it.


Best regards, and congratulations for the release

Moritz