Friday, October 06, 2006

Get Inside the PageRank Algorithm

Delve into the inner workings of the Google PageRank algorithm and learn how it affects results.

PageRank, the algorithm used by the Google search engine, was originally formulated by Sergey Brin and Larry Page in their paper "The Anatomy of a Large-Scale Hypertextual Web Search Engine" (http://www-db.stanford.edu/~backrub/google.html).

PageRank is based on the premise, prevalent in the world of academia, that the importance of a research paper can be judged by the number of citations it receives from other research papers. Brin and Page simply transferred this premise to its web equivalent: the importance of a web page can be judged by the number of hyperlinks that point to it from other web pages.

What's the Algorithm?

It might look daunting to nonmathematicians, but the PageRank algorithm is in fact elegantly simple and is calculated as follows:

  • PR(A) is the PageRank of a page A.

  • PR(T1) is the PageRank of a page T1.

  • C(T1) is the number of outgoing links from the page T1.

  • d is a damping factor in the range 0 <>

The PageRank of a web page is therefore calculated as a sum of the PageRanks of all the pages that link to it (its incoming links), divided by the number of links on each of those pages (its outgoing links).

What Does It Mean?

From a search engine marketer's point of view, this means there are two ways in which PageRank can affect the position of your page on Google:

The number of incoming links

Obviously, the more of these, the better. But there is another thing the algorithm tells you: no incoming link can have a negative effect on the PageRank of the page it points to. At worst, it can have no effect at all.

The number of outgoing links on the page that points to your page

The fewer of these, the better. This is interesting: given two pages of equal PageRank that link to you, one with 5 outgoing links and the other with 10, you receive twice the increase in PageRank from the page with only 5 outgoing links.

At this point, take a step back and ask yourself just how important PageRank is to the position of your page in the Google Search results.

Note that the PageRank algorithm is that it has nothing whatsoever to do with relevance to the search terms queried. It is simply a single (admittedly important) part of the entire Google relevance-ranking algorithm.

Perhaps a good way to look at PageRank is as a multiplying factor applied to the Google Search results after all other computations have been completed. The Google algorithm calculates the relevance of pages in its index to the search terms, and then multiplies this relevance by the PageRank to produce a final list. The higher your PageRank, therefore, the higher up the result list you will be. However, there are still many other factors related to the positioning of words on the page that must be considered.

What's the Use of the PageRank Calculator?

If no incoming link has a negative effect, surely you should just get as many as possible, regardless of the number of outgoing links on its page?

Well, not entirely. The PageRank algorithm is cleverly balanced. Just like the conservation of energy in every physical reaction, PageRank is also conserved with every calculation. For instance, if a page with a starting PageRank of 4 has two outgoing links on it, you know that the amount of PageRank it passes is divided equally between each of its outgoing links. In this case, 4 / 2 = 2 units of PageRank are passed on to each of 2 separate pages, and 2 + 2 = 4so the total PageRank is preserved!

On a much larger scale, supposing Google's index contains a billion pages, each with a PageRank of 1, the total PageRank across all pages is equal to a billion. Moreover, each time you recalculate PageRank, no matter what changes in PageRank occur between individual pages, the total PageRank across all one billion pages still adds up to a billion.

This means that although you may not be able to change the total PageRank across all pages, by strategically linking pages within your site, you can affect the distribution of PageRank between pages. For instance, you may want most of your visitors to enter the site through your home page. You would therefore want your home page to have a higher PageRank relative to other pages within the site. Also recall that all the PageRank of a page is passed on and is divided equally between each outgoing link on a page. You would therefore want to keep as much combined PageRank as possible within your own site without passing it to external sites and losing its benefit. This means you would want any page with lots of external links (i.e., links to other people's web sites) to have a lower PageRank relative to other pages within the site to minimize the amount of PageRank that is leaked to external sites. Also, bear in mind the earlier statement that PageRank is simply a multiplying factor applied once Google's other calculations regarding relevance have already been done. You would therefore want your more keyword-rich pages to also have a higher relative PageRank.

Also, assuming that every new page in Google's index begins its life with a PageRank of 1, there is a way to increase the combined PageRank of pages within your site: increase the number of pages! A site with 10 pages starts life with a combined PageRank of 10, which is then redistributed through its hyperlinks. A site with 12 pages therefore starts with a combined PageRank of 12. You can thus improve the PageRank of your site as a whole by creating new content (i.e., more pages), and then controlling the distribution of that combined PageRank through strategic interlinking between the pages.

And this is the purpose of the PageRank Calculator: to create a model of the site on a small scale, including the links between pages, and see what effect the model has on the distribution of PageRank.

How Does the PageRank Calculator Work?

To get a better idea of the realities of PageRank, visit the PageRank Calculator (http://www.markhorrell.com/seo/pagerank.asp).

It's simple, really. Start by typing in the number of interlinking pages you want to analyze and hit Submit. I have confined this number to just 20 pages to ease server resources. Even so, this should give a reasonable indication of how strategic linking can affect the PageRank distribution.

Next, for ease of reference once the calculation has been performed, provide a label for each page (e.g., Home Page, Links Page, Contact Us Page, etc.), and again hit Submit.

Finally, use the list boxes to select which pages each page links to. You can use Ctrl and Shift to highlight multiple selections.

You can also use this screen to change the initial PageRanks of each page. For instance, if one of your pages is supposed to represent Yahoo!, you may want to raise its initial PageRank to, say, 3. However, in actuality, the initial PageRank is irrelevant to its final computed value. In other words, even if one page were to start with a PageRank of 100, after many iterations of the equation, the final computed PageRank would converge to the same value as if it had started with a PageRank of only 1!

You can play around with the damping factor d, which defaults to 0.85, as this is the value quoted in Brin and Page's research paper.

No comments: