lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


2005-02-07T20:56:53 PA:
> On Feb 07, 2005, at 21:45, Jolan Luff wrote:
> >Is "full text search" some new buzzword that I'm not familiar with?
> 
> Perhaps.
> 
> http://en.wikipedia.org/wiki/Information_retrieval

Definitely looks like a nice overview, but I don't see an answer on
the spot of the question.

Doing full text search with grep(1) is boring; that's a solution
that only works well for small amounts of text.

For large amounts of text, interesting amounts, full-text search
works in a two-pass process. There's a relatively slow, lengthy
process that builds an index --- typically 1/3 to 1/2 the size of
the corpus of text being indexed --- and then very fast searches
using that index. For years I used glimpse as my full-text search
engine, but it wandered off behind a proprietary license and I
lost track of it; lately I've been enjoying swish++ for full-text
searching. 

These tools are terrific when you want to perform multiple keyword
searches across large bodies of text --- e.g. all the documentation
for all the packages in CPAN; all the RFCs; big email archives;
trouble-ticket databases; etc.

A freshmeat search for full-text search will turn up a lot of 'em.

-Bennett

Attachment: pgpRW2xKgd8rj.pgp
Description: PGP signature