lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]



On Feb 08, 2005, at 07:49, Steve Donovan wrote:

That's what a pure Lua implementation would do;  the index
could be plain text, containing the words as indices (or
hashes).   That would certainly be fast enough for
most things - it's a question of scaling and whether
one can afford the memory etc.

Just as a follow-up, here is my first draft implementation of a diminutive text search in Lua:

http://dev.alt.textdrive.com/file/LUPad/LUPIndex.lua

In a nutshell, the indices are stored in gdbm. The key being a document id and the value a sample of the text.

When indexed, the text is broken down along non-alphanumeric boundaries, each token added to a counted set. Finally, a string representation of the set is stored with the more frequent token first.

During search, each value is evaluated with a simple find(). The resulting ids are ranked according to the index of the value.

Very brain dead, but kind of work :)

Cheers

--
PA, Onnay Equitursay
http://alt.textdrive.com/