[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Full text search?
- From: Bennett Todd <bet@...>
- Date: Mon, 7 Feb 2005 21:36:48 +0000
2005-02-07T20:56:53 PA:
> On Feb 07, 2005, at 21:45, Jolan Luff wrote:
> >Is "full text search" some new buzzword that I'm not familiar with?
>
> Perhaps.
>
> http://en.wikipedia.org/wiki/Information_retrieval
Definitely looks like a nice overview, but I don't see an answer on
the spot of the question.
Doing full text search with grep(1) is boring; that's a solution
that only works well for small amounts of text.
For large amounts of text, interesting amounts, full-text search
works in a two-pass process. There's a relatively slow, lengthy
process that builds an index --- typically 1/3 to 1/2 the size of
the corpus of text being indexed --- and then very fast searches
using that index. For years I used glimpse as my full-text search
engine, but it wandered off behind a proprietary license and I
lost track of it; lately I've been enjoying swish++ for full-text
searching.
These tools are terrific when you want to perform multiple keyword
searches across large bodies of text --- e.g. all the documentation
for all the packages in CPAN; all the RFCs; big email archives;
trouble-ticket databases; etc.
A freshmeat search for full-text search will turn up a lot of 'em.
-Bennett
Attachment:
pgpfGO5tOco94.pgp
Description: PGP signature