|
Google, the popular search
engine, is not alone at the top. Although Google has a slightly
larger number of indexed pages than its closest competitor, AlltheWeb
-- 3.3 billion vs. 3.2 billion -- the number is misleading in terms
of search-result pages. In fact, 42% of keywords lead to more page
results from AlltheWeb than from Google, according to a recently
conducted study by DomainMart. Furthermore, about 2.3% of keywords
retrieve no page results on Google, but retrieve at least 1 page
on AlltheWeb.
This finding suggests
that you may be better off using AlltheWeb for certain classes of
keywords, or at least you should use both search engines to improve
the results of your searches.
In general, the quality
of search-engine results depends on whether you are interested in
current pages or archived pages. Once a Web page is modified or
deleted, the previous version is lost when the popular search engines
update their databases. If you are interested in archived Web pages,
you should use Alexa’s Wayback
Machine.
For the most recent information,
the quality of results depends on
(1)
the number of indexed pages relevant to the keyword searched.
(2)
the criteria used to rank the relevancy of pages for a given keyword
search. (Google’s initial success was due to assessing the relevance
of a Web site based on the number of other sites that link to it).
(3)
whether the criteria used to rank sponsored advertising takes into
account users as well as advertisers. If ranking is based solely
on bids, for example, it might discourage users from clicking on
ads, which in turn reduces advertising, leads to a drop in profits
of both advertisers and search engines, and can mean the demise
of the search engine.
(4)
the frequency of updating the ranking index for new content.
(5)
the frequency of modifying the ranking criteria, as users start
devising techniques to fool the search engines into indexing irrelevant
pages. (The first such threat came from Google bombs through blogs.)
As with
the concept of stock-market efficiency, you cannot keep a winning
strategy a secret for a long time. Thus, ranking criteria need to
be frequently modified.
(6)
a search engine’s ability to organize results according to their
position among recognized authoritative sites on a topic (such as
Teoma) or to use the sources of information (such as products, news,
eBay) to refine results (as with Vivísimo).
(7)
a search engine’s ability to learn from the behavior of users (such
as the startup Mooter).
Thus,
even if a search engine employs a better ranking technique, it is
not necessarily better when a significant number of pages are omitted
from its search database. Conversely, indexing a large number of
pages does not necessarily improve the quality of top search results
if the ranking methodology is flawed.
DomainMart’s study was
motivated by our continuous effort to improve domain-name pricing
through understanding search-engine results in relation to keyword
composition of domain names. This study was directed at investigating
quality feature (1) above—i.e., the number of indexed pages relevant
to the keyword searched.
The sample used in the
study consisted of 551 keywords representing domain names sold between
January 2003 and January 2004. For each keyword, we recorded the
number of search results from Google, AlltheWeb, and AltaVista search
engines.
Since the number of indexed
pages between Google and AlltheWeb are almost the same, as noted
earlier, and since we did not analyze possible commonality among
search-result pages in response to a keyword, we decided that a
test for equality of averages between two samples did not make a
lot of sense. In theory, the page-result data from Google and from
AlltheWeb can have no common pages, which makes such tests irrelevant.
Moreover, such statistical tests (the standard t-test and Wilcoxon
nonparametric signed-rank test) require making specific assumptions
about the structure of data that we did not believe appropriate
for our data. Thus, we decided to look at intuitive descriptors
of the data.
The following table summarizes
some of our findings. It shows the proportion of AlltheWeb and AltaVista
search results that were larger than Google’s, as well as the proportion
of results greater than Google’s by at least 25%, 50%, and 75%.
For example, 42% of the keyword searches on AlltheWeb had greater
page results than Google, and 32% of those surpassed Google page
results by 25%.
| Search Engine |
Proportion (%) of page results greater than Google’s
by at least |
| 25% |
50% |
75% |
|
AlltheWeb |
32 |
25 |
20 |
|
AltaVista |
2 |
1.8 |
1.8 |
Hence, the study indicates
that you will find a significant number of pages using AlltheWeb
that are not available using Google.
Further research is needed
to determine whether there are classes of keywords that will yield
greater search results if used with a particular search engine.
For example, what is the result differences for single vs. multiple
keywords?
Encouraged by the above
results, we are studying methodologies to improve searches by combining
information from multiple search engines.
|