[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Ideas about implementing a string type?
- From: David Kastrup <dak@...>
- Date: Sun, 01 Apr 2007 22:23:38 +0200
Hans Hagen <pragma@wxs.nl> writes:
> David Kastrup wrote:
>> Of course, LuaTeX is an extreme case, but I would not go as far as to
>> call it pathological.
>>
> luatex (and its developers) is quite happy with the current lua
> implementation, the cases where huge stings are read from file as a
> whole are seldom and even then, on my 5 year old laptop, reading a
> 30 meg file in 2 seconds is ok. On more resonable files it becomes
> neglectable esp given the things that need to be done with such
> content.
Oh, actually I was not concerned about large strings as much rather
than about many smaller ones that are repeatedly getting interned into
Lua.
For some callbacks getting information from TeX, Taco switched the
data passed into Lua from strings to integers and reportedly got a
speedup of about 10 from that.
> we use some fast iteratators for (multibyte) strings (handy for
> utf16/32 as well as utf collapsing; we've done quite some timings in
> that area
>
> in luatex the majority of data processing involves tables (huge ones
> for fonts and such) and these can be cached (bytecode); when dealing
> with strings there are quite some string comparisons (when
> manipulating fontdata) and hashes strings work fine there; caching
> hash keys is one of the potential optimizations in that area
> (actually, tex itself also hashes a lot of its input, esp control
> sequences in macro code)
Not really especially: control sequence names is very much what TeX
hashes. There is another hash table for hyphenation exceptions, and
the hyphenation trie also works with some sort of hashing, but those
hashes are not really interesting to pass into Lua as far as I can
see.
I asked Taco whether he thought of trying to replace the control
sequence hash with a Lua hash at some point of time (that would have
quite lowered the criticality of moving token name strings back and
forth to LuaTeX, since they would then no longer need to get interned
each time). He said that he tried something like it already (I don't
know the details) and that it slowed down TeX by approximately a
factor of 100. I don't know how much of this was due to what factors,
though, but it seems like replacing data structures and fitting
algorithms hand-crafted and hand-tuned by Donald Knuth with Lua
structures is, well, "changing a winning team".
It would likely require a lot of surgery in TeX to avoid performance
due to the TeX/Lua border.
It is my guess that having lightweight strings where Lua avoided
copying, hashing and unification when not needed might help making the
seams less noticeable.
It, for the case of Lua callbacks, be even quite possible that strings
are going to get passed into Lua that are not actually looked at. One
could pass them in as opaque data, and use a an explicit call to
convert those into Lua strings whenever Lua needs to process them, but
this will make programming more cumbersome.
[...]
> the lua-tex-lua interface will have optimizations but we stick
> within the existing lua concepts (if one reads up on the history of
> lua we're pretty sure that the authors know what they're doing -)
I have absolutely no doubts about that. I just want to give them a
hint of what I am doing...
No mistake: Lua is quite well-designed and implemented quite carefully
(I quite like its coroutines, for example). And the things it does it
does rather efficiently. But in the case of string handling, I think
it would sometimes be nice if it could avoid doing some things
altogether when getting passed strings to process.
Whether the conscientious simplicity of Lua's data structure design
leaves room for a computationally cheaper string type, or some changes
to the existing string type, is of course the developer's decision.
But I think that for basic string processing tasks, there is perhaps
still leeway to make calling Lua for it (or just stuffing a Lua
callback with info it might not even actually need) more of a
no-brainer. The cheaper it gets to pass data into Lua, the more
transparency of the main application's data structures to Lua one can
afford.
--
David Kastrup, Kriemhildstr. 15, 44793 Bochum