lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


[another message in the long-running series of "Jay looks in on lua-l
and finds some tenuous connection with work he did years ago"]

On Tue, Apr 12, 2011 at 12:36 AM, Josh Haberman <jhaberman@gmail.com> wrote:

> If I have the whole file mmap()'d this works great since I can
> reconstruct any string at any time.  But if I'm operating over a
> stream of data like a network socket, this is a problem since I don't
> want to have to keep all the data I've ever read buffered in memory.
> But in most cases this isn't a problem, because the application will
> only need to look at one record at a time:
>
> for entry in ParseLogEntriesAtTheSpeedOfLight(io.stdin) do
>  -- Pull anything you need from "entry" out now, because it
>  -- will no longer be valid once we advance to the next one!

I agree with the other posts that the garbage collector encapsulates a
notion of unreachability you may be able to harness.

LazyKit builds LOM-style XML parse trees on demand; that is, if you
have only asked for for node[1], node[2] may not have materialized
yet. By the time node[2] is requested, node[1] may no longer be
referenced in the node[] table, and hence will be picked up by the
garbage collector.

Your destructive iteration sounds similar.

Excerpts from http://lua-users.org/wiki/ConsumingXml :

----
xpairs_c(tree), xnpairs_c(tree), and xmliter.switch_c(parent, ftable,
[opts]) also iterate over the children of tree, but they _consume_ the
children of tree as they process it. The following two fragments have
similar semantics:

    for i,x in xpairs(parent) do
      parent[i] = nil
      do_something_with(x)
    end

    for i,x in xpairs_c(parent) do
      do_something_with(x)
    end

Using a consuming iterator means that you do not care about accessing
previously processed trees through parent. However, you can still save
the current tree for later use:

    for i,x,name in xnpairs(parent) do
      if x.name == "xref" then
        table.insert(references, x)
      end
    end

[...]

With a conventional XML tree, processing this [log] requires space
linearly proportional to the size of all the <entry> elements. With
normal iterators and a lazy tree, this requires space linearly
proportional to all previously processed <entry> elements (as future
elements are only read on demand.) With consuming iterators and a lazy
tree, processing only requires space proportional to the size of a
single <entry> element, as previously processed <entry>s have been
forgotten.

[...]

What is really going on here is that iterators provide an event-based
interface to tables. Consuming iterators provide many of the same
benefits as pure event-based XML parsers, while allowing you to
fluidly switch back and forth to a tree-based API when that makes
sense.
----

If I knew a cheap way to be notified that node[1] and all of its
incomplete descendants no longer had references, I would also know
there would be no way that any continued parsing of it could affect
the future of the computation. Tree building could then automatically
stop and have the XML parser fast-forward to node[1]'s close tag. All
the layers of control inversion in the tree-building code were
mind-bending enough that I don't think I ever implemented even the
manual full fast-forward.

Jay Carlson
nop@nop.com