lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Wim Couwenberg wrote:
> >This is a perfect example for a coroutine iterator (see chapter 9.3):
> The version below looks more contrived but is >10% faster on my system 
> (in lua 5.0.2)

I tried Lua 5.1w5 and get similar results. Ok, let's see ...
I can shave off another 10% by caching the global lookups:

local function allmatches(pattern)
  return coroutine.wrap(function()
    local gfind, yield = string.gfind, coroutine.yield
    for line in io.lines() do
      for word in gfind(line, pattern) do

But doing the same in your example helps, too. And then you could
move the initial io.lines() call to the factory function and avoid
a conditional. Now it's faster than my coroutine iterator, again.

Ok, this is sacrificing clarity for speed. Still, it should not
be necessary. Lua coroutines are very fast, but maybe they need
to get even faster (see below).

> Maybe the linestate/wordstate variables are redundant (didn't check) but 
> it would be cheating to leave them out (and make an assumption on the 
> implementation of io.lines and string.gfind).

Both are nil because io.lines and string.gfind return only a closure.
But yes, it would be cheating (compared to the coroutine approach).

Ok, even though the reason why the coroutine approach is slower
is pretty obvious, I profiled it: it spends around 8% less time in
luaV_execute (12% vs. 20%) but wastes more time doing the resumes
and yields. Especially because the latter needs an additional call
frame (which could be improved by making yield a keyword/VM op).

Still both solutions spend most of the time in the regex code (22%)
and in luaS_newlstr + malloc/free (13%). Interestingly at least
an additional 10% is wasted in __ctype_b_loc and __ctype_tolower_loc
(glibc NLS support for character classes).

The latter seems strange at first, but lstrlib.c:match_class() is
the culprit. The tolower() can be avoided easily (just add the
uppercase labels, too). The overhead of the other ctype macros is
difficult to avoid (and gcc inserts 10 calls to __ctype_b_loc *sigh*).
Compiling and linking with an NLS ignorant libc may be a good idea
for those using lots of (ASCII) regex's.