Sunday, October 01, 2006

Compare Google and Yahoo! Search Results

Pit Google and Yahoo! against each other and find more search results in the process.

If you've ever searched for the same phrase at both Google and Yahoo!, you've probably noticed that the results can be surprisingly different. That's because Google and Yahoo! have different ways of determining which sites are relevant for a particular phrase. Though both companies keep the exact way of how they determine the rank of results a secretto thwart people who would take advantage of itboth Yahoo! and Google provide some clues about what goes into their ranking system.

At the heart of Google's ranking system is a proprietary method it calls PageRank, and Google doesn't give detailed information about it. But Google does say this:

Google's order of results is automatically determined by more than 100 factors, including our PageRank algorithm.

Here's the official word from Yahoo!:

Yahoo! Search ranks results according to their relevance to a particular query by analyzing the web page text, title, and description accuracy as well as its source, associated links, and other unique document characteristics.

Though we might never know exactly why results are different between the two search engines, at least we can have some fun spotting the differencesand end up with more search results than either one of the sites would have offered on their own.

One way to compare results is to simply open each site in separate browser windows and manually scan for differences. If you search for your favorite dog breedsay, "australian shepherd"you'll find that the top few sites are the same across both Yahoo! and Google, but the two search engines quickly diverge into different results. At the time of this writing, both sites estimate exactly 1,030,000 total results for this particular query, but estimated result counts might be a way to spot differences between the sites.

Viewing both sets of results in different windows is a bit tedious, and a clever Norwegian developer named Asgeir S. Nilsen has made the task easier, at a site called Twingine.

Twingine

The Twingine site (http://twingine.com) contains a blank search form into which you can type any search query. When you click Search, the site brings up the results pages for that query from both Yahoo! and Google, side by side. To be fair, the sides on which Google and Yahoo! appear change at random, so people who prefer one side of the screen to the other won't be biased. Plugging "australian shepherd" into Twingine yields

Clicking Next or Previous in the top frame at
Twingine takes you to the next or previous page in the search results at both sites.

Surfing the pages in the search results at Twingine can be a bit tricky. You'll probably want to open linked search results in a new window or tab, so that you can keep your place in the search results at both Yahoo! and Google. You can open links in a new window by right-clicking the link (Ctrl-click on a Mac) and choosing Open Link in New Window from the menu. You can also set your search preference at either search engine to automatically open links in a new window when you click a search result.

Yahoo! Versus Google Diagram

Another site, developed by Christian Langreiter, adds a bit of analysis to the different sets of search results between Yahoo! and Google. If you have Flash installed, you can type a search query into the form at http://www.langreiter.com/exec/yahoo-vs-google.html, and the site fetches the search results from both engines in the background using their open APIs.

Each blue or white dot in the diagram represents a search result URL, and the position of the dot represents the ranking. The dots on the far left are the top search results, and the further right you go, the further down you go in the search results. The blue lines represent the same URL, so you can see exactly where Google and Yahoo! line up.

In figure , you can see that the top search result for "australian shepherd" is the same URL, but the lines aren't as evenly matched further down in the results. As you hover over each dot, you see the URL, which you can click to visit that particular search result.

The white dots in the diagram represent a URL that one search has in the results that the other does not. And as this diagram demonstrates, neither search engine has a monopoly on matching pages, nor does each engine's index have every page on a particular topic.

If you already do serious research with search engines, you're very aware that having several search tools at your disposal is better than relying on one. And with the methods mentioned in this blog, you can compare and contrast the tools, giving you more results to choose from.

Your Own Google Search Form

Build your own personal, task-specific Google search form.

If you want to do a simple search with Google, you need only the standard Simple Search form (the Google home page). But if you want to craft specific Google searches to use on a regular basis or provide for others, you can simply put together your own personalized search form.

Start with a garden-variety Google search form; something like this will do nicely:

This is a very simple search form. It takes your query and sends it directly to Google, adding nothing to it. But you can embed some variables to alter your search as needed. You can do this in two ways: via hidden variables or by adding more input to your form.

Hidden Variables

As long as you know how to identify a search option in Google, you can add it to your search form via a hidden variable. The fact it's hidden just means that form users can't alter it. They can't even see it unless they look at the source code. Let's look at a few examples.

File Type

As the name suggests, File Type specifies that your results are filtered by a particular file type (e.g., Word .doc, Adobe .pdf, PowerPoint .ppt, plain text .txt). Add a PowerPoint file type filter, for example, to your search form, like so:

Site Search

Narrows your search to specific sites. While a suffix such as .com will work just fine, something more fine-grained such as the example.com domain is probably better suited:

URL Component

Specifies a particular path component to look for in URLs. This can include a domain name but doesn't have to. The following tries to tease out documentation in your result set:

Date Range

Narrows your search to pages indexed within the stated number of months. Acceptable values are between 1 and 12. Restricting your results to items indexed only within the last seven months is just a matter of adding:

Number of Results

Indicates the number of results you'd like to appear on each page, specified as a value of num between 1 and 100; the following asks for 50 per page:

What would you use this for? If you regularly look for an easy way to create a search engine that finds certain file types in a certain place, this works really well.
Creating Your Own Google Form

Some variables work well hidden; however, for other options, you can give your form users visible options to provide more flexibility.

Let's go back to the previous example. You want to let your users search for PDF files, but you also want them to be able to search for Excel and Microsoft Word files. In addition, you want them to be able to search not only oreilly.com, but also the State of California or the Library of Congress web sites. Obviously, there are various ways to design this form; this example uses a couple of simple pull-down menus.

FaganFinder (http://www.faganfinder.com/engines/google.shtml) is a wonderful example of a thoroughly customized form.

If you find yourself running fairly complex queries on a regular basis, you can speed things up by setting a few options in a custom form. And chances are good that if you find the convenience of a custom form helpful, others will too. So, making your custom form available on your web site is a good way to let others share in your productivity.

Cover Your Bases

Try all possible combinations of your search keywords at once, and find related keywords with Google Sets.

Imagine you have a set of query words but are not sure that they're the right set; you certainly don't want to miss any results by picking the wrong combination of keywords, including or excluding the wrong word. But the thought of typing a dozen-plus permutations of keywords has your carpal tunnel flaring up in horror. With some existing tools, you can fine-tune your Google queries by playing with word setsleading you down paths you might not have discovered.

Search Grid (http://blog.outer-court.com/search-grid), by German programmer Philipp Lenssen, lets you explore a wide range of Google search results by automatically searching for multiple combinations of keywords you specify. This gives you a quick overview of paths you can follow for a given set of keywords. You might, for example, put catsup, mustard, and pickles on the x-axis and relish, onions, and tomatoes on the y-axis.

Note that you get nothing but the first result; this is not the tool to use if you want an in-depth search of each query. Instead, it's meant to give you a bird's-eye view of how the different combinations of search words impact the query.

There's also a version of Search Grid that's been integrated into a web tool called FindForward (http://www.findforward.com/?t=grid), which gives you screenshots of some Google search results. FindFoward requires less typing: enter two to five words for which you want to check possible permutations. You get a large grid of search results, with screenshots available for some of the pages

Note that this grid searches each of your keywords individually (one square for mustard, one for pickles, one for relish) and searches every possible combination of two words (pickles relish, pickles mustard, mustard relish, etc.), but it doesn't search for three- and four-word permutations. In other words, this tool doesn't find every last possible permutation of your search. Again, it's an overview that gives you an idea of how different word combinations can affect your search, and it is not meant to be exhaustive.

Buy why limit yourself to keyword sets that you can dream up? Google has its own tool in development to expand your keyword vocabulary based on a small set of words. Google Sets (http://labs.google.com/sets) allows you to enter several keywords and have Google predict similar keywords in a large or small set. For example, plug catsup, mustard, and pickles into the form and click Large Set. You should see a list of 25 or more words that run the condiment gamut from Lettuce to Black Olive

Find Directories of Information

Use Google to find directories, link lists, and other collections of information.

Sometimes you're more interested in large information collections than scouring for specific bits and bobs. You could always take a stroll through the Google Directory (http://directory.google.com) to see what's available, but sometimes a topic-specific directory is what you need.

Using Google, there are a couple of different ways to find directories, link lists, and other information collections from across the Web. The first uses Google's full-word wildcards and the intitle: syntax. The second is a judicious use of particular keywords.

Title Tags and Wildcards

Pick something you'd like to find collections of information about. We'll use "trees" as our example. The first thing we look for is any page with the words "directory" and "trees" in its title. In fact, we build in a little buffering for words that might appear between the two using a couple of full-word wildcards (* characters). The resultant query looks something like this:

intitle:"directory * * trees"
This query finds "directories of evergreen trees," "South African trees," and of course "directories containing simply trees."

What if you want to take things up a notch, taxonomically speaking, and find directories of botanical information? Use a combination of intitle: and keywords, like so:

botany intitle:"directory of"
and you get almost 10,000 results. Changing the tenor of the information might be a matter of restricting results to those coming from academic institutions. Appending an edu site specification brings you to:
botany intitle:"directory of" site:edu
This gets you around 150 results, a mixture of resource directories, and, unsurprisingly, directories of university professors.

Mixing these syntaxes works rather well when searching for something that might also be an offline print resource. For example:

cars intitle:"encyclopedia of"
This query pulls in results from Amazon.com and other sites that sell car encyclopedias. Filter out some of the more obvious book finds by tweaking the query slightly:
cars intitle:"encyclopedia of" -site:amazon.com
-inurl:book -inurl:products
The query specifies that search results should not come from Amazon.com and should not have the word "products" or "book" in the URL, which eliminates a fair amount of online stores. For some interesting finds, play with this query by changing the word "cars" to whatever you like.

If mixing syntaxes doesn't find the resources you want, there are some clever keyword combinations that might just do the trick.

Finding Searchable Subject Indexes with Google

There are a few major searchable subject indexes and myriad minor ones that deal with a particular topic or idea. You can find the smaller subject indexes by customizing a few generic searches. "what's new" "what's cool" directory, while gleaning a few false results, is a great way to find searchable subject indexes.

directory "gossamer threads" new is an interesting one. Gossamer Threads is the creator of a popular link directory program. This is a good way to find searchable subject indexes without too many false hits.

directory "what's new" categories cool doesn't work particularly well, because the word "directory" is not a very reliable search term, but you will pull in some things with this query that you might otherwise have missed.

Let's put a few of these into practice:

"what's new" "what's cool" directory phylum
"what's new" "what's cool" directory carburetor
"what's new" "what's cool" directory "investigative journalism"
"what's new" directory categories gardening
directory "gossamer threads" new sailboats
directory "what's new" categories cool "basset hounds"
The real trick is to use a more general word, but make it unique enough that it applies mostly to your topic and not to many other topics.

Take acupuncture, for instance. Start narrowing it down by topic. What kind of acupuncture? For people or animals? If for people, what kinds of conditions are being treated? If for animals, what kinds of animals? Maybe you should search for "cat acupuncture", or maybe you should search for acupuncture arthritis. If this first round doesn't narrow the search results enough, keep going. Are you looking for education or treatment? You can skew results one way or the other using the site: syntax. So maybe you want "cat acupuncture" site:com or arthritis acupuncture site:edu. By taking just a few steps to narrow things down, you can get a reasonable number of search results focused around your topic.