Thoughts on testing

When compering the different search engines I wold recommend to look at the top results only, and ignore loading time and number of hits.

The different search engines do for the most part runs as virtual machines on two VMware servers. There may be schedule issues that make some search engines appear slower then others from time to time, without it being the search engines fault.

Result count is also routinely overstated by some search engines. For example for the query enron the Google Mini says it finds more documents then it is in the collections. The Google Mini and Microsoft Search Server seems to be especially prone to this, and always returns a nice round number when they have many results.


I have create some example queries you can use to get started with the comparison. To get to know the two data sets better, I recommend that you also try to brows them online to find interesting documents. Then try to find thus document by searching for relative keywords.

Brows the English Wikipedia set her, and the Enron set her.