lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

>  Hmmm... doesn't Sputnik provide "full-text search through Google API"?
>  How does excluding Google from a Sputnik instance affect this?

I'll assume that this is a serious question and not an invitation to a
food fight. :)

Searching via Google API most surely won't work for sites that Google
can't access.  This would include demos like this one and (more
importantly) any intranet sites.  This obviously does not create
problems for sites like the, which _are_ indexed by
Google.  Note that since the URLs don't change, Google API search
would work from day one if were to switch to Sputnik.
(Again, I am not advocating a switch.)

Using Google API has a few other cons as well as a serious number of
pros.  The main issues are:

* there is a delay in indexing (partly counter-acted by generating
sitemap.xml that tells Google which pages changed)
* more generally, you have little control over search results
* there may be privacy concerns for some users

Most advantages revolve around the fact that Google does a pretty good
job searching and it would take serious work to implement the same
features locally:

* full-text search (try searching Sputnik's wiki for "MoinMoin")
* ability to search for multiple keywords (try searching for
"permissions _config")
* stemming for a number of languages (searching for "программирование"
finds pages that talk about "механизмы ...  программирования")
* synonym expansion (try "~localization")
* term exclusion or  requiring a term with +

It also offers an advantage that would be nearly impossible to
implement locally: the ranking of pages reflects the links to them
from _outside_ the wiki.

We discussed the search issue in July and considered implementing
simple search, but the feeling at the time was that simple search
within page names isn't really so useful at the end of the day.  (But
seriously, if there is demand for it, I'll add it tomorrow.)  Doing
our own full-text search would also be feasible, but it would be hard
to match what Google offers and would raise the question of how to
store the index.  I've been committed from day one to making sure that
Sputnik can be used with CGI to make it more "democratic."  One can
build an index of headings in under a second, but one can't expect to
build an index of all the content while the user is waiting for a
response.  There needs to be a way to store the index and also to
update it in an efficient way.  I think the best long term solution
would be to make Lua binding to some robust search system like Xapian,
but this is a task I have not yet found time for.

Again, if anyone else thinks that a simple unranked search just within
page names is a useful addition, I'll do it tomorrow or on Sunday, and
I will make it installable as a rock on top of existing (recent)
Sputnik installations.

  - yuri