Friday, October 06, 2006

Scrape Google AdWords

Scrape the AdWords from a saved Google results page into a form suitable for importing into a spreadsheet or database.

Google's AdWordsthe text ads that appear to the right of the regular search resultsare delivered on a cost-per-click basis, and purchasers of the AdWords are allowed to set a ceiling on the amount of money they spend on their ad. This means that, even if you run a search for the same query word multiple times, you won't necessarily get the same set of ads each time.

If you're considering using Google AdWords to run ads, you might want to gather up and save the ads that are running for the query words that interest you. Google AdWords is not included in the functionality provided by the Google API, so you have to do a little scraping to get at that data.

The Code

Save this code to a text file named adwords.pl:

#!/usr/bin/perl

# usage: perl adwords.pl results.html
#
use strict;
use HTML::TokeParser;

die "I need at least one file: $!\\n"
unless @ARGV;

my @Ads;
for my $file (@ARGV){
# skip if the file doesn't exist
# you could add more file testing here.
# errors go to STDERR so they won't
# pollute our csv file

unless (-e $file) {
warn "What??: $file -- $! \\n-- skipping --\\n";
next;
}

# now parse the file
my $p = HTML::TokeParser->new($file);
while(my $token = $p->get_token) {
next unless $token->[0] eq 'S'
and $token->[1] eq 'a'
and $token->[2]{id} =~ /^aw\\d$/;
my $link = $token->[2]{href};
my $ad;
if($link =~ /pagead/) {
my($url) = $link =~ /adurl=([^\\&]+)/;
$ad->{href} = $url;
} elsif($link =~ m{^/url\\?}) {
my($url) = $link =~ /\\&q=([^&]+)/;
$url =~ s/%3F/\\?/;
$url =~ s/%3D/=/g;
$url =~ s/%25/%/g;
$ad->{href} = $url;
}
$ad->{adwords} = $p->get_trimmed_text('/a');
$ad->{desc} = $p->get_trimmed_text('/font');
($ad->{url}) = $ad->{desc} =~ /([\\S]+)$/;
push(@Ads,$ad);

}
}

print quoted( qw( AdWords HREF Description URL ) );
for my $ad (@Ads) {
print quoted( @$ad{qw( adwords href desc url )} );
}


sub quoted {
return join( ",", map { "\\"$_\\"" } @_ )."\\n";
}

Running the Code

Call this script on the command line , providing the name of the saved Google results page and a file in which to put the CSV results:

% perl adwords.pl 

input.html


>

output.csv


input.html is the name of the Google results page that you've saved. output.csv is the name of the comma-delimited file to which you want to save your results. You can also provide multiple input files on the command line if you'd like:

% perl adwords.pl   

input.html



input2.html


>

output.csv

 

The results appear in a comma-delimited format, as in these results for the query new car

The c returns the AdWords headline, the link URL, the description in the ad, and the URL on the ad (this is the URL that appears in the ad text, while the hrEF is what the URL links to). With the file in hand, you can open output.csv in Excel and see which companies are using which headlines and descriptions. Scraping AdWords is a quick way to get a feel for how others are using the service.

Generate Google AdWords

You've written the copy and you've planned the budget. Now, what keywords are you going to use for your ad?

You've read about it and you've thought about it and you're ready to buy one of Google's AdWords. You've even got your copy together, and you feel pretty confident about it. You have only one problem now: figuring out your keywords (the search words that will trigger your AdWord to appear).

You're probably buying into the AdWords program on a budget, and you definitely want to make every penny count. Choosing the right keywords means your ad will have a higher click-through rate. Thankfully, the Google AdWords program allows you to do a lot of tweaking, so if your first choices don't work, experiment, test, and test some more!

Choosing AdWords

So where do you get the search keywords for your ad? There are four places that might help you find them:


Logfiles

Examine your site's logfiles. How are people finding your site now? What words are they using? What search engines are they using? Are the words they're using too general to be used for AdWords? If you look at your logfiles, you can get an idea of how people who are interested in your content are finding your site. (If they aren't interested in your content, why would they visit?)

In fact, if you use Google Analytics (https://www.google.com/analytics/) to measure your site traffic, you'll find a category called Keyword Considerations that automatically tabulates the keywords that people have used to find your site via Google and other search engines.


Examine your site

If you have an internal search engine, check its logs. What are people searching for once they get to your site? Are there any common misspellings you could use as an AdWord? Are there any common phrases you could use?


Brainstorm

What do people think of when they look at your site? What keywords do you want them to think of? Brainstorm about the product that's most closely associated with your site. What words come up?

Imagine someone goes to a store and asks about your products. How would they ask? What words would they use? Consider the different ways someone could look for or ask about your product or service, and if there's a set of words or a phrase that pops up over and over again.


Glossaries

If you've brainstormed until wax dribbles out your ears but are no closer to coming up with words relevant to your site or product, visit some online glossaries to jog your brain. The Glossarist (http://www.glossarist.com) links to hundreds of glossaries on hundreds of different subjects. Check and see if it has a glossary relevant to your product or service, and see if you can pull some words from there.


Keyword tools

Google's competitor Yahoo! has its own advertising network called Overture, and it offers a tool to test potential advertising keywords. Browse to the Keyword Select Tool (http://inventory.overture.com/d/searchinventory/suggestion/) and take a few of your ideas for a spin. You'll find Yahoo! search counts for your word, and related terms.

Get the Most Out of AdWords

Here's some guest commentary and advice by Andrew Goodman of Page Zero Media on how advertisers can better cope with the increasingly complex Google AdWords program.

AdWords (https://adwords.google.com) is just the sort of advertising program that you might expect to roll out of the big brains at Google. The designers of the advertising system have innovated thoroughly to provide precise targeting at low cost with less work; it really is a completely new way of looking at advertising.

The "less work" part is something I take seriously. Focus on structuring a campaign in a robust way so that you can understand and adjust it. Don't beat yourself up by trying to get everything 100 percent perfect. A "robust" campaign has a well-planned category structure, or "ontology" (basically, themed campaigns and ad groups). And a robust ongoing advertising project includes easy-to-follow metrics such as cost per order, and an easy interface for reading those results. Remember, you might wind up handing this work off to someone else. Don't make it convoluted.

In the early days (AdWords 1.0), the platform offered few features and charged a fixed rate (in cost-per-thousand impressions, or CPM, format) for ads that would show up near search results, triggered by a user query on a keyword or phrase you placed in your AdWords account.

In 2002, Google came out with version 2.0. The pricing was based on a cost-per-click auction that allowed advertisers to bid, but also incorporated click-through rates (CTR) into the ad rank formula. The idea was to push more relevant ads higher on the page.

In August 2005 (with experiments and refinements taking place prior to and following that), Google changed the system fairly significantly. Instead of CTR, the new ad rank formula became Max CPC (your bid on a keyword or a group of keywords) multiplied by Quality Score (QS). QS is multidimensional. Google states that CTR is predominant in QS.

But other factors, including landing page relevancy and ad copy relevancy, can play a significant role in where your ad now ranks. It's largely a black box, but if you've followed Google closely, you may be aware of the types of issues it focuses on. On the organic search side, certain kinds of deceptiveness and poor user experiences are judged as "evil," and you could wind up paying the penalty. It's safe to say that some of Google's main pet peevessuch as pop-ups, deceptive redirects, and pages with nothing but graphicsare included in the list of potential QS criteria.

If a keyword in your AdWords account has a low quality score, you might be asked to bid very high$5 per click isn't uncommonjust to show up at all. Some advertisers are relatively unaffected by these changes. Others have been virtually wiped off the map.

Between this and the increasingly competitive environment, advertisers must become quite systematic with how they test and refine campaigns. Those with large accounts now demand more features to help them stay organized, to control ad delivery, and to report on their results. Irrespective of the fundamental "version" of AdWords that I'm calling 2.5, the past couple of years have seen Google add dozens of small features and refinements to the platform, sometimes seemingly on a weekly basis. Let me draw your attention to a few notable ones. You need to look elsewhere if you seek in-depth programming advice vis-\x88 -vis the AdWords API and the like. To borrow a phrase from Jim Sterne, "I'm a marketing guy."

Dynamic Keyword Insertion: Beware

Most of the time, users click on ads with titles and text that closely match what they're looking for. Lazy advertisers who want to take advantage of the ability to match the user's query in their ad title or ad text can use dynamic keyword insertion. The format is {KeyWord: Alternate text }yes, you need to include those squiggly brackets (called braces). So, if this is placed in your ad title and the user types Red Cactus, Red Cactus will be your ad title.

A typical use of this format is a large list of products that you don't want to create separate ads for; you want users to see something better than a generic title. Here's the problem with dynamic keyword insertion. Even factoring in the ad rank boost you get from the higher CTR that generally comes with matching the user's query, the return on investment of ads using such matching is ofteneven usuallylower than if you use a manual ad title that is slightly less Pavlovian in its appeal to searchers. You do pay a penalty in terms of ad rank when you write ad copy that filters out some prospects, but it's often worth it to pay that penalty.

Run Better Tests

Go into the "edit campaign settings" area of your account, for any given campaign. The default method of rotating ads is set to Google's advantage. Google will start crowding out the ad that gets fewer clicks (a lower CTR) and start showing your high-CTR ad more often. That puts more money in its pocket, but it doesn't allow you to run the test evenly to measure through to your revenue stream to be sure which ad is truly the "better performer."

To ensure even ad rotation, uncheck the "show better-performing ads more often" box. Incidentally, the even ad rotation continues to be an underused feature of AdWords. You can use it to test more than ads. If you have two or three alternate landing pages, you can send your AdWords traffic to them equally from a given group of keywords just by setting up more than one ad for an ad group, with identical ad copy, and changing only the destination URLs associated with the ad.

If you have tagged each ad URL with unique tracking code so your analytics package knows what's going on (or are using Google's Conversion Tracker), you should be able to compare the conversion rates on these landing pages without having to use any fancy content management techniques on your site. An example would be testing the home page against a tailored landing page, or a category page showing a selection of choices against a page describing a single product in depth. Don't guess, test.

Run Smarter Reports

You don't need to sign up for Google's Analytics service, or buy Webtrends Enterprise, Omniture SiteCatalyst, or other high-powered analytics services, to get useful tracking of your clicks right through to a sale. Google's entry-level Conversion Tracker gives you plenty of information if you set it up right. It's fairly similar to most analytics software in that you have to install the correct JavaScript code on your site and sometimes customize things to pass through revenue data, if desired.

But what if you want to determine the return on investment on ad copy you're testing, as opposed to which keywords or groups of keywords are performing better? Many users aren't aware of how powerful the advanced reporting is in AdWords. Go to the Reports tab and, once on the Create Report page, select Text Ad Report. You can then select a time frame and limit the report to the campaigns you're interested in.

Finally, you need to customize the stats shown in the report. Conversion-related data isn't included by default, so you need to bring up the whole range of available report elements (under Advanced Options: Columns, further down the Create Report page) and then click some additional boxes to add them.

Run the report and view it either online or in your preferred downloadable format. Quite simply, if you've made enough sales off the groups of keywords in question, you should see comparative data showing exact cost-per-conversion numbers for ads you've been running head to head (no fancy tracking codes required on your destination URLs). If you determine that one ad is significantly better than another in its ultimate revenue generation effect, you've made the most important discovery you can make about ad performance, and next, you'll want to delete the nonperforming ad and perhaps run new tests. You'll also want to attempt to learn lessons from the test, which is easier if you've been testing certain elements of your copy based on industry theories and your own hypothesis.

Sometimes, these tests are amazing. I ran a test on one producta new ad competing with the oldfor the month of April 2006. Both generated about 600 clicks. The old ad generated 11 unique new customers at a cost of $9.73 per customer. The new ad generated zero transactions, so we're still waiting on that cost-per-acquisition figure as it is currently at "infinity"! 110...on 600 clicks! That's statistically significant to say the least. Goodbye to the new ad.

This helped us confirm that our call to action in the old ad, relating to free shipping and available inventory, was working well. Unfortunately, AdWords doesn't give me suggestions for how to explain to a client that I wasted his cash testing a "great new ad" that turned out to suck. Maybe we can chalk it up to "branding."

Control Your Ad Positions

You can now tell AdWordskeyword by keywordto show your ads only in certain positions in the advertising area of the search engine results page. This is available in Advanced Options under Edit Campaign Settings.

Let's say you want your ad to show up only if you can be no higher on the page than ad position 3 and no lower than 6. Enabling the ad position control provides a drop-down box in "keyword settings" for every keyword. The default is "any position," but if you say to show your ad no higher than third position, and no lower than sixth, Google simply doesn't show your ad if your ad rank forces you outside these positions. Be careful with this one.

Essentially, what it's telling the system is to keep your ads turned off unless your positioning criteria are met, so you could see sharply reduced ad impressions if you use the feature. This feature won't adjust your bid for you, so this is not "classic" bid-to-position functionality that is available with some third-party bid management tools. It looks like Google is testing the waters on this feature, including the revenue impact, in order to decide whether to implement a more robust set of bid management options.

For now, Google still has no bid-to-position feature and no day-parting feature. Advertisers seeking such exotica need to investigate third-party options or custom programming on their own through the AdWords API. Keep in mind, though, that the existing bid system is easy to use because it has an automatic bid discounter, and it subjects your ad delivery to whatever budgetary parameters you specify. Frequent (several times daily) bid changes are overrated, especially given that your ad rank is not solely dependent on your bid.

Don't Be a Slave to Automation

Google has been brilliant about automating editorial functions. If you're new to some of the legal and policy issues, the process can be daunting. One example is the automated ad copy checker that kicks in when you enter a new ad. Sometimes, the system identifies a misspelling when in fact you're using a niche term that's relevant to your region or industry.

Other times, a potential trademark violation comes up, such as using the word "enterprise" in your ad copy (which is the name of a rental car company). If you're not in the rental car business, it's highly unlikely that your use of a common word constitutes any kind of violation. You should politely appeal to Google in the box provided. Usually, your ad is approved within 48 hours.

Control Your Content Bidding

Some advanced advertisers might want to use custom programming to separate clicks from Google's "content partners" from clicks on the same keywords emanating from the search network. I prefer to use established analytics packages that do this. At the very least, you need to understand that most "content targeting" clicks are worth significantly less to the average advertiser than most clicks originating from a web search.

In Edit Campaign settings, make sure you enable "content bidding" if you have "content targeting" enabled. Then set different bids on content, ad group by ad group. If you're bidding around $.50 on a set of keywords, I'd recommend bidding $.15$.20 on content, or even less. The only way to know for sure is to use sophisticated analytics.

Control Your Geography

During campaign setup, you're offered "location targeting options." For existing campaigns, you can edit these settings. Specify countries and territories, regions and cities, or custom. Whatever you do, don't show your ad to the whole world unless that's your intention. Using custom geography, you can even target a radius of a few hundred miles, or a geographic "shape," if you're handy with the tools, which are now easier to use with a WYSIWYG-type interface on a map.

Boo-yah! This functionality works fairly well subject to the limitations of mapping users' IP addresses to exact locations. It's currently available in about 15 countries. Even national advertisers may find this highly useful, running campaigns in several dozen major metropolitan areas and adjusting their bids upwards in some of them to reflect the cities that are most responsive in terms of sales conversions.

A lot of these features can be a downright blast to use, especially for junkies. Don't forget your goals: acquiring new customers and communicating with them effectively and consistently. Ultimately, it's about customers, growth, and profitwhatever those concepts mean to you and your business. Many of the most advanced features have the effectwhen boiled downof causing you to buy too much media, or too little. Keep that in mind as you build and refine a campaign that runs consistently in an optimal range.

Clean Up for a Google Visit

Before you submit your site to Google, make sure you've cleaned it up to make the most of your indexing.

You clean up your house when you have important guests over, right? If you want visitors, Google's crawler is one of the most important guests your site will ever have. A high Google ranking can lead to incredible numbers of referrals, both from Google's main site and from sites with searches powered by Google.

To make the most of your listing, step back and look at your site. By making some adjustments, you can make your site both more Google-friendly and more visitor-friendly:


If you must use a splash page, have a text link from it.

If I had a dollar for every time I went to the front page of a site and saw no way to navigate besides a Flash movie, I'd be able to nap for a living. Google doesn't index Flash files, so unless you have some kind of text link on your splash page (a "Skip This Movie" link, for example, that leads into the heart of your site), you're not giving Google's crawler anything to work with. You're also making it difficult for surfers who don't have Flash or are visually impaired.


Make sure your internal links work.

Sounds like a no-brainer, doesn't it? Make sure your internal page links work so the Google crawler can get to all your site's pages. You should also make sure that your visitors can navigate.


Check your title tags.

There are few things sadder than getting a page of search results and finding "Insert Your Title Here" as the title for some of them, although this is not quite as bad as getting results for a domain and seeing the exact same title tag over and over and over and over.

Look. Google makes it possible to search just the title tags in its index. Further, the title tags are easy to read on Google's search results and are an easy way for a surfer to quickly get an idea of what a page is about. If you're not making the most of your title tag, you're missing out on a lot of attention to your site.

The perfect title tag, to me, says something specific about the page it heads and is readable to both spiders and surfers. This means you shouldn't stuff it with as many keywords as you can. Make it a readable sentence, orand I've found this to be useful for some pagesa question.


Check your META tags.

Google sometimes relies on META tags for a site description when there's a lot of navigation code that wouldn't make sense to a human searcher. I'm not crazy about META tags, but I'd make sure that at least the front page of my web site has a description and keyword META tag set, especially if my site relies heavily on code-based navigation (like from JavaScript).


Check your ALT tags.

Do you use a lot of graphics on your pages? Do you have ALT tags for them so that visually impaired surfers and the Google spider can figure out what those graphics are? If you have a splash page with nothing but graphics on it, do you have ALT tags on all those graphics so that a Google spider can get some idea of the content? ALT tags are perhaps the most neglected aspect of a web site. Make sure yours are set up.

By the way, just because ALT tags are a good idea, don't go crazy. You don't have to explain in your ALT tags that a list bullet is a list bullet. You can just mark it with an asterisk.


Check your frames.

If you use frames, you might be missing out on some indexing. Google recommends you read Danny Sullivan's article "Search Engines and Frames" at http://www.searchenginewatch.com/webmasters/frames.html. Be sure that Google can either handle your frame setup or that you've created an alternative way for Google to visit, such as using the NOFRAMES tag.


Consider your dynamic pages.

Google says they "limit the number and amount of dynamic pages" they index. Are you using dynamic pages? Do you have to?


Consider how often you update your content.

There is some evidence that Google indexes popular pages with frequently updated content more often. How often do you update the content on your front page?


Make sure you have a robots.txt file if you need one.

If you want Google to index your site in a particular way, make sure you have a robots.txt file for the Google spider to refer to. You can learn more about robots.txt at http://www.robotstxt.org/wc/norobots.html.


If you don't want Google to cache your pages, you can add a line to every page that you don't want cached.

Add this line to the HEAD section of your page:

META tag with NAME="ROBOTS" and CONTENT="NOARCHIVE"
This tells all robots that archive content, including engines such as Daypop and Gigablast, not to cache your page. If you want to exclude just the Google spider from caching your page, use this line:
META tag with NAME="GOOGLEBOT" and CONTENT="NOARCHIVE"

Be a Good Search Engine Citizen

Five don'ts and one do for getting your site indexed by Google.

A high ranking in Google can mean a great deal of traffic. Because of that, there are lots of people spending lots of time trying to figure out an infallible way to get a high ranking from Google. Add this. Remove that. Get a link from this. Don't post a link to that.

Submitting your site to Google to be indexed is simple enough. Google has a site submission form (http://www.google.com/addurl.html), though it says that if your site has at least a few inbound links (other sites that link to you), it should find you that way. In fact, Google encourages URL submitters to get listed on The Open Directory Project (ODP, http://www.dmoz.org) or Yahoo! (http://www.yahoo.com).

Nobody knows the secret of achieving high PageRank without effort. Google uses a variety of elements, including page popularity, to determine PageRank. PageRank is one of the factors determining how high up a page appears in search results. But there are several things that you should not do and one big thing that you absolutely should.

Does breaking one of these rules mean that you will automatically be thrown out of Google's index? No. There are over four billion pages in Google's index as of this writing, and it's unlikely that Google will immediately find out about your violation. But there's a good chance it'll find out eventually. Is it worth having your site removed from the most popular search engine on the Internet?

Thou Shalt Not:

Cloak

Cloaking is when your web site is set up such that search engine spiders get different pages than those that human surfers get. How does the web site know which are the spiders and which are the humans? By identifying the spider's User Agent or IPthe latter being the more reliable method.

An Internet Protocol (IP) address is the computer address from which a spider comes. Everything that connects to the Internet has an IP address. Sometimes the IP address is always the same, as with web sites. Sometimes the IP address changes, in which case it's called a dynamic address. (If you use a dial-up modem, chances are that every time you log onto the Internet your IP address is different. That's a dynamic IP address.)

A User Agent is a way for a program that surfs the Web to identify itself. Internet browsers such as Mozilla use User Agents, as do search engine spiders. There are literally dozens of different kinds of User Agents; see the Web Robots Database (http://www.robotstxt.org/wc/active.html) for an extensive list.

Advocates of cloaking claim that cloaking is useful to absolutely optimize content for spiders. Anti-cloaking critics claim that cloaking is an easy way to misrepresent site contentfor example, feeding a spider a page designed to get site hits for pudding cups when the site is actually about baseball bats. You can get more details about cloaking and different perspectives on it at http://pandecta.com/, http://www.apromotionguide.com/cloaking.html, and http://www.webopedia.com/TERM/C/cloaking.html.

Hide text

Text is hidden by putting words or links in a web page that are the same color as the page's backgroundputting white words on a white background, for example. This is also called fontmatching. Why would you do this? Because a search engine spider can read the words you've hidden on the page while a human visitor can't. Again, getting caught doing this could get you banned from Google's index, so don't do it.

This goes for other page content tricks too, such as title stacking (putting multiple copies of a title tag on one page), putting keywords in comment tags, keyword stuffing (putting multiple copies of keywords in a very small font on the page), putting keywords not relevant to your site in your META tags, and so on. Google doesn't provide an exhaustive list of these types of tricks on its site, but any attempt to circumvent or fool its ranking system is likely to be frowned upon. Its attitude is more like, "You can do anything you want to with your pages, and we can do anything we want to with our indexsuch as excluding your pages."

Use doorway pages

Doorway pages (sometimes called gateway pages) are pages aimed specifically at one topic. They don't have a lot of original content and lead to the main page of a site (thus the name doorway pages).

For example, say you have a page devoted to cooking. You create doorway pages for several types of cookingFrench cooking, Chinese cooking, vegetarian cooking, etc. The pages contain terms and META tags relevant to each type, but most of the text is a copy of all the other doorway pages, and all it does is point to your main site.

Doorway pages are illegal in Google and annoying to the Google user, so don't use them. You can learn more about doorway pages at:

Check your link rank with automated queries

Using automated queries (except for the sanctioned Google API) is against Google's Terms of Service. Using an automated query to check your PageRank every 12 seconds is triple-bad: it's not what the search engine was built for, and Google probably considers it a waste of its time and resources.

Link to "bad neighborhoods."

Bad neighborhoods are sites that exist only to propagate links. Because link popularity is one aspect of how Google determines PageRank, some of these sites have set up link farms, which are sites that exist only for the purpose of building site popularity with bunches of links. The links are not topical, like a specialty subject index, and they're not well-reviewed, like Yahoo!; they're just a pile of links. Another example of a bad neighborhood is a general FFA (free for all) page, where anyone can add their link. Linking to pages in this way is grounds for a penalty from Google.

Now, what happens if one of these pages links to you? Will Google penalize your page? No. Google accepts that you have no control over who links to your site.

Thou Shalt:

Create great content

All the HTML contortions in the world will do you little good if you have lousy, old, or limited content. If you create great content and promote it without playing search engine games, you will get noticed and you will get links. Remember Sturgeon's Law: "Ninety percent of everything is crud." Why not make your web site an exception?

What Happens If You Reform?

Maybe your site is not exactly the work of a good search engine citizen. Maybe you have 500 doorway pages, 10 title tags per page, and enough hidden text to make an O'Reilly Pocket Guide. But maybe now you want to reform. You want to have a clean, lovely site and leave the doorway pages to Better Homes and Gardens. Are you doomed? Will Google ban your site for life?

No. The first thing you need to do is clean up your site. Remove all traces of rule breaking. Next, send a note about your site changes and the URL to help@google.com. Note that Google really doesn't have the resources to answer every email about why it did or didn't index a site. Otherwise, it'd be answering emails all dayand there's no guarantee it will reindex your kinder, gentler site. But it will look at your message.

What Happens If You Spot Google Abusers in the Index?

What if some other site that you come across in your Google searching is abusing Google's spider and PageRank mechanism? You have two options. You can send an email to spamreport@google.com or fill out the form at http://www.google.com/contact/spamreport.html. (I'd fill out the form; it reports the abuse in a standard format that Google is used to seeing.)

26 Steps to 15 KB a Day

Hot and cold running content is what draws visitors to your web site.

Too often, getting visitors from search engines is boiled down to a succession of tweaks that may or may not work. But, as I show in this hack, solid content thoughtfully put together can make more of an impact than a decade's worth of fiddling with META tags and building the perfect title page.

Following these 26 steps from A to Z will guarantee a successful site, bringing in plenty of visitors from Google.

A. Prep Work

Prepare work and begin to build content. Long before the domain name is settled on, start putting together notes to build a site of at least 100 pages. That's 100 pages of "real content," not including link, resource, about, and copyright pages, which are necessary, but not content-rich, pages.

Can't think of 100 pages' worth of content? Consider articles about your business or industry, Q&A pages, or back issues of an online newsletter.

B. Choose a Brandable Domain Name

Choose a domain name that's easily brandable. For example, choose something like Google.com and not .com.

Keyword domains are out; branding and name recognition are in. Big time in. Keywords in a domain name have never meant less to search engines. Consider Goto.com becoming Overture.com, and understand why it was changed. It's one of the most powerful gut check calls I've ever seen on the Internet. It took resolve and nerve to blow away several years of branding. (That's a whole 'nother article, but learn the lesson as it applies to all of us.)

C. Site Design

The simpler your site design, the better. As a rule, text content should outweigh HTML content. The pages should be validated and usable in everything from Lynx to leading browsers. In other words, keep it close to HTML 3.2 if you can. Spiders do not yet like eating HTML 4.0 and the mess that it can bring. Stay away from heavy Flash, Java, or JavaScript.

Go external with scripting languages if you must have them, though there's little reason to have them that I can see. They rarely help a site and can actually hurt it greatly due to many factors that most people don't appreciate (the search engines' distaste for JavaScript is just one of them). Arrange the site in a logical manner with directory names hitting the top keywords that you want to emphasize. You can also go the other route and just throw everything in the top level of the directory (this is rather controversial, but it's produced good long-term results across many engines). Don't clutter or spam your site with frivolous links such as "best viewed in...", or other things such as counters. Keep it clean and professional to the best of your ability.

Learn the lesson of Google itself: simple is retro cool. Simple is what surfers want.

Speed isn't everything; it's the only thing. Your site should respond almost instantly to a request. If your site has three to four seconds' delay until "something happens" in the browser, you're in trouble. That three to four seconds of response time may vary in sites viewed in countries other than your native one. The site should respond locally within three to four seconds (maximum) to any request. Longer than that, and you'll lose 10 percent of your audience for each additional second. That 10 percent could be the difference between success and failure.

D. Page Size

The smaller the page size, the better. Keep it under 15 KB, including images, if you can. The smaller the better. Keep it under 12 KB if you can. The smaller the better. Keep it under 10 KB if you can. I trust you are getting the idea here. Over 5 KB and under 10 KB. It's tough to do, but it's worth the effort. Remember, 80 percent of your surfers will be at 56 KB or less.

E. Content

Build one page of content (between 200 and 500 words) per day and put it online.

If you aren't sure what you need for content, start with the Overture keyword suggestor (http://inventory.overture.com/d/searchinventory/suggestion/) and find the core set of keywords for your topic area. Those are your subject starters.

F. Keyword Density and Keyword Positioning

This is simple, old-fashioned Search Engine Optimization (SEO) from the ground up.

Use the keyword once in the title, once in the description tag, once in a heading, once in the URL, once in bold, once in italic, and once high on the page, and make sure the density is between 5 and 20 percent (don't fret about it). Use well-written sentences and spellcheck them! Spellchecking is becoming more important as search engines are moving toward autocorrection during searches. There is no longer a reason to look like you can't spell.

G. Outbound Links

From every page, link to one or two high-ranking sites under the keyword you're trying to emphasize. Use your keyword in the link text (this is ultra-important for the future).

H. Cross-Links

Cross-links are links within the same site.

Link to on-topic quality content across your site. If a page is about food, make sure it links to you apples page and your veggies page. With Google, on-topic cross-linking is important for sharing your PageRank value across your site. You do not want an all-star page that outperforms the rest of your site. You want 50 pages that produce 1 referral each a day, not 1 page that produces 50 referrals each day. If you find a page that drastically outproduces the rest of the site with Google, you need to offload some of that PageRank value to other pages by cross-linking heavily. It's that old share-the-wealth thing.

I. Put It Online

Don't go with virtual hosting; go with a standalone IP address.

Make sure the site is crawlable by a spider. All pages should be linked to more than one other page on your site, and not more than two levels deep from the top directory. Link the topic vertically as much as possible back to the top directory. A menu that is present on every page should link to your site's main topic index pages (the doorways and logical navigation system that lead to real content). Don't put your site online before it is ready. It's worse to put a nothing site online than no site at all. You want it to be fleshed out from the start.

Go for a listing in the Open Directory Project (ODP) (http://dmoz.org/add.html). Getting accepted to the ODP will probably get your pages listed in the Google Directory.

J. Submit

Submit your main URL to Google, F*, AltaVista, WiseNut, Teoma, DirectHit, and Hotbot. Now comes the hard part: forget about submissions for the next six months. That's right, submit and forget.

K. Logging and Tracking

Get a quality logger/tracker that can do justice to inbound referrals based on logfiles. Don't use a graphic counter; you need a program that can provide much more information than that. If your host doesn't support referrers, back up and get a new host. You can't run a modern site without full referrals available 24/7/365 in real time.

L. Spiderings

Watch for spiders from search engines (one reason you need a good logger and tracker!). Make sure that spiders crawling the full site can do so easily. If not, double-check your linking system to make sure the spider can find its way throughout the site. Don't fret if it takes two spiderings to complete your whole site for Google or F*. Other search engines are potluck; if you haven't been added within six months, it's doubtful you'll be added at all.

M. Topic Directories

Almost every keyword sector has an authority hub on its topic. Find it (Google Directory can be very helpful here because you can view sites based on how popular they are) and submit within the guidelines.

N. Links

Look around your keyword section in the Google Directory; this is best done after getting an Open Directory Project listingor two. Find sites that have link pages or freely exchange links. Simply request a swap. Put a page of on-topic, in-context links on your site as a collection spot. Don't worry if you can't get people to swap links; move on. Try to swap links with one fresh site a day. A simple personal email is enough. Stay low-key about it and don't worry if site Z doesn't link to you. Eventually it will.

O. Content

Add one page of quality content per day. Timely, topical articles are always best. Try to stay away from too much blogging of personal material and look more for article topics that a general audience will like. Hone your writing skills and read up on the right style of web speak that tends to work with the fast-and-furious web crowd: lots of text breaksshort sentenceslots of dashessomething that reads quickly.

Most web users don't actually read; they scan. This is why it is so important to keep key pages to a minimum. If people see a huge overblown page, a portion of them will hit the Back button before trying to decipher it. They have better things to do than waste 15 seconds (a stretch) at understanding your whizbang menu system. Just because some big support site can run Flash-heavy pages, this does not mean that you can. You don't have the pull factor that they do.

Use headers and bold standout text liberally on your pages as logical separators. I call them scanner stoppers because the eye logically comes to rest on the page.

P. Gimmicks

Stay far away from fads of the day or anything that appears spammy, unethical, or tricky. Plant yourself firmly on the high ground in the middle of the road.

Q. Linkbacks

When you receive requests for links, check out the sites before linking back to them. Check them through Google for their PageRank value. Look for directory listings. Don't link back to junk just because you were asked. Make sure they're sites similar to yours and on-topic. Linking to bad neighborhoods, as Google calls them, can actually cost you PageRank points.

R. Rounding Out Your Offerings

Use options such as "email a friend," forums, and mailing lists to round out your site's offerings. Hit the top forums in your market and read, read, read until your eyes hurt. Stay away from affiliate fades that insert content onto your site such as banners and pop-up windows.

S. Beware of Flyer and Brochure Syndrome

If you have an economical site or online version of bricks and mortar, be careful not to turn your site into a brochure. These don't work at all. Think about what people want. They don't come to your site to view your content, they come to your site looking for their content. Talk as little about your products and yourself as possible in articles (sounds counterintuitive, doesn't it?).

T. Keep Building One Page of Content Per Day

Head back to the Overture suggestion tool (http://inventory.overture.com/d/searchinventory/suggestion/) to get ideas for fresh pages.

U. Study Those Logs

After a month or two, you will start to see a few referrals from places you were able to get listed. Look for the keywords people are using. See any bizarre combinations? Why are people using them to find your site? If there is something you have overlooked, then build a page around that topic. Engineer your site to feed the search engine what it wants. If your site is about oranges, but your referrals are about orange citrus fruit, then get busy building articles around citrus and fruit instead of the generic oranges. The search engines tell you exactly what they want to be fed. Listen closely! There is gold in referral logs; it's just a matter of panning for it.

V. Timely Topics

Nothing breeds success like success. Stay abreast of developments in your topic of interest. If big site Z is coming out with product A at the end of the year, build a page and have it ready in October so that search engines get it by December.

W. Friends and Family

Networking is critical to the success of a site. This is where all that time you spend in forums pays off. Here's the catch-22 about forums: lurking is almost useless. The value of a forum is in the interaction with your colleagues and cohorts. You learn from the interaction, not just by reading. Networking pays off in linkbacks, tips, and email exchanges, and generally puts you in the loop of your keyword sector.

X. Notes, Notes, Notes

If you build one page per day, you will find that brainstorm-like inspiration will hit you in the head at some magic point. Whether you are in the shower (dry off first), driving (please pull over), or just parked at your desk, write it down! If you don't, then 10 minutes later, you will have forgotten all about that great idea. Write it down and get specific about what you are thinking. When the inspirational juices are no longer flowing, come back to those content ideas. It sounds simple, but it's a lifesaver when the ideas stop coming.

Y. Submission Check at Six Months

After six months, walk back through your submissions and see if you have been listed in all the search engines you submitted to. If not, resubmit and forget again. Try those freebie directories again, too.

Z. Keep Building Those Pages of Quality Content!

Starting to see a theme here? Google loves content, lots of quality content. The content you generate should be based on a variety of keywords. After a year, you should have around 400 pages of content. This will get you good placement under a wide range of keywords, generate reciprocal links, and position your site to stand on its own two feet.

Do these 26 things, and I guarantee you that in one year's time, you will call your site a success. It will draw between 500 and 2,000 referrals a day from search engines. If you build a good site and achieve an average of 4 to 5 page views per visitor, you should be in the range of 1015 KB page views per day in one year's time. What you do with that traffic is up to you!

Get Inside the PageRank Algorithm

Delve into the inner workings of the Google PageRank algorithm and learn how it affects results.

PageRank, the algorithm used by the Google search engine, was originally formulated by Sergey Brin and Larry Page in their paper "The Anatomy of a Large-Scale Hypertextual Web Search Engine" (http://www-db.stanford.edu/~backrub/google.html).

PageRank is based on the premise, prevalent in the world of academia, that the importance of a research paper can be judged by the number of citations it receives from other research papers. Brin and Page simply transferred this premise to its web equivalent: the importance of a web page can be judged by the number of hyperlinks that point to it from other web pages.

What's the Algorithm?

It might look daunting to nonmathematicians, but the PageRank algorithm is in fact elegantly simple and is calculated as follows:

  • PR(A) is the PageRank of a page A.

  • PR(T1) is the PageRank of a page T1.

  • C(T1) is the number of outgoing links from the page T1.

  • d is a damping factor in the range 0 <>

The PageRank of a web page is therefore calculated as a sum of the PageRanks of all the pages that link to it (its incoming links), divided by the number of links on each of those pages (its outgoing links).

What Does It Mean?

From a search engine marketer's point of view, this means there are two ways in which PageRank can affect the position of your page on Google:

The number of incoming links

Obviously, the more of these, the better. But there is another thing the algorithm tells you: no incoming link can have a negative effect on the PageRank of the page it points to. At worst, it can have no effect at all.

The number of outgoing links on the page that points to your page

The fewer of these, the better. This is interesting: given two pages of equal PageRank that link to you, one with 5 outgoing links and the other with 10, you receive twice the increase in PageRank from the page with only 5 outgoing links.

At this point, take a step back and ask yourself just how important PageRank is to the position of your page in the Google Search results.

Note that the PageRank algorithm is that it has nothing whatsoever to do with relevance to the search terms queried. It is simply a single (admittedly important) part of the entire Google relevance-ranking algorithm.

Perhaps a good way to look at PageRank is as a multiplying factor applied to the Google Search results after all other computations have been completed. The Google algorithm calculates the relevance of pages in its index to the search terms, and then multiplies this relevance by the PageRank to produce a final list. The higher your PageRank, therefore, the higher up the result list you will be. However, there are still many other factors related to the positioning of words on the page that must be considered.

What's the Use of the PageRank Calculator?

If no incoming link has a negative effect, surely you should just get as many as possible, regardless of the number of outgoing links on its page?

Well, not entirely. The PageRank algorithm is cleverly balanced. Just like the conservation of energy in every physical reaction, PageRank is also conserved with every calculation. For instance, if a page with a starting PageRank of 4 has two outgoing links on it, you know that the amount of PageRank it passes is divided equally between each of its outgoing links. In this case, 4 / 2 = 2 units of PageRank are passed on to each of 2 separate pages, and 2 + 2 = 4so the total PageRank is preserved!

On a much larger scale, supposing Google's index contains a billion pages, each with a PageRank of 1, the total PageRank across all pages is equal to a billion. Moreover, each time you recalculate PageRank, no matter what changes in PageRank occur between individual pages, the total PageRank across all one billion pages still adds up to a billion.

This means that although you may not be able to change the total PageRank across all pages, by strategically linking pages within your site, you can affect the distribution of PageRank between pages. For instance, you may want most of your visitors to enter the site through your home page. You would therefore want your home page to have a higher PageRank relative to other pages within the site. Also recall that all the PageRank of a page is passed on and is divided equally between each outgoing link on a page. You would therefore want to keep as much combined PageRank as possible within your own site without passing it to external sites and losing its benefit. This means you would want any page with lots of external links (i.e., links to other people's web sites) to have a lower PageRank relative to other pages within the site to minimize the amount of PageRank that is leaked to external sites. Also, bear in mind the earlier statement that PageRank is simply a multiplying factor applied once Google's other calculations regarding relevance have already been done. You would therefore want your more keyword-rich pages to also have a higher relative PageRank.

Also, assuming that every new page in Google's index begins its life with a PageRank of 1, there is a way to increase the combined PageRank of pages within your site: increase the number of pages! A site with 10 pages starts life with a combined PageRank of 10, which is then redistributed through its hyperlinks. A site with 12 pages therefore starts with a combined PageRank of 12. You can thus improve the PageRank of your site as a whole by creating new content (i.e., more pages), and then controlling the distribution of that combined PageRank through strategic interlinking between the pages.

And this is the purpose of the PageRank Calculator: to create a model of the site on a small scale, including the links between pages, and see what effect the model has on the distribution of PageRank.

How Does the PageRank Calculator Work?

To get a better idea of the realities of PageRank, visit the PageRank Calculator (http://www.markhorrell.com/seo/pagerank.asp).

It's simple, really. Start by typing in the number of interlinking pages you want to analyze and hit Submit. I have confined this number to just 20 pages to ease server resources. Even so, this should give a reasonable indication of how strategic linking can affect the PageRank distribution.

Next, for ease of reference once the calculation has been performed, provide a label for each page (e.g., Home Page, Links Page, Contact Us Page, etc.), and again hit Submit.

Finally, use the list boxes to select which pages each page links to. You can use Ctrl and Shift to highlight multiple selections.

You can also use this screen to change the initial PageRanks of each page. For instance, if one of your pages is supposed to represent Yahoo!, you may want to raise its initial PageRank to, say, 3. However, in actuality, the initial PageRank is irrelevant to its final computed value. In other words, even if one page were to start with a PageRank of 100, after many iterations of the equation, the final computed PageRank would converge to the same value as if it had started with a PageRank of only 1!

You can play around with the damping factor d, which defaults to 0.85, as this is the value quoted in Brin and Page's research paper.

A Webmaster's Introduction to Google

Steps to take for optimal Google indexing of your site.

The cornerstone of any good search engine is highly relevant results. Google's unprecedented success has been due to its uncanny ability to match quality information with a user's search terms. The core of Google's search results is based on a patented algorithm called PageRank.

There is an entire industry focused on getting sites listed near the top of search engines. Google has proven to be the toughest search engine for a site to do well on. Even so, it isn't all that difficult to get a new web site listed and begin receiving traffic from Google.

Learning the ins and outs of getting your site listed by a search engine can be a daunting task. There is a vast array of information about search engines on the Web, and not all of it is useful or proper. This discussion of getting your site into the Google database focuses on long-term techniques for successfully promoting your site through Google, helping you avoid some of the common misconceptions and problems that a new site owner might face.

Search Engine Basics

When you type a term into a search site, the engine looks up potential matches in its database and presents the most relevant web page matches first. How those web pages get into the database and, consequently, how you can get yours in there as well, is a three-step process:

  1. A search engine visits a site with an automated program called a spider (sometimes called a robot). A spider is a program similar to a web browser that downloads a site's pages. It doesn't actually display the page anywhere; it just downloads the page data.

  2. After the spider has acquired the page, the search engine passes the page to a program called an indexer, which is another robotic program that extracts most of the visible portions of the page. The indexer also analyzes the page for keywords, the title, links, and other important information contained in the code.

  3. The search engine adds your site to its database and makes it available to searchers. The greatest difference between search engines is in this final step where ranking or result position for a particular keyword is determined.

Submitting Your Site to Google

The first step is to get your pages listed in the database, and there are two ways to go about this. The first is direct submission of your site's URL to Google via its "Add your URL to Google" page. To counter programmed robots, search engines routinely move submission pages around on their sites. You can find Google's submission page linked from its Help pages or Webmaster Info pages (http://www.google.com/addurl.html).

Visit Google's add URL page, enter the main index page for your site into the submission form, and press Submit. Google's spider (called GoogleBot) will visit your page, usually within four weeks. The spider will traverse all the pages on your site and add them to its index. Within eight weeks, you should be able to find your site listed in Google.

The second way to get your site listed is to let Google find you based on links that point to your site. Once GoogleBot finds a link to your site from a page that is already in its index, it will visit your site.

Google has been updating its database on a monthly basis for three years. It sends its spider out in crawler mode once a month, as well. Crawler mode is a special mode in which a spider traverses, or crawls, the entire Web. As it runs into page links, it indexes those pages in a never-ending attempt to download all the pages it can. Once your pages are listed in Google, they are revisited and updated on a monthly basis. If you frequently update your content, Google may index your search terms more often.

Once you are indexed and listed in Google, the next question for a site owner naturally is, "How can I rank better under my applicable search terms?"

The Search Engine Optimization Template

This is my general recipe for the ubiquitous Google. It is generic enough that it works well everywhere and is as close as I have come to a "one-size-fits-all" SEO (Search Engine Optimization) template.

Use your targeted keyword phrase:

  • In META keywords. It's not necessary for Google, but it is still a good habit. Keep your META keywords short (128 characters max, or 10 keywords).

  • In a META description. Keep your keywords near the left but as part of a full sentence.

  • In the title at the far left, but not as the first word.

  • In the top portion of the page in the first sentence of the first full paragraph (plain text: no bold, no italic, no style).

  • In an H3 or larger heading.

  • In bold (second paragraph if possible and anywhere except in the first usage on the page).

  • In italic (anywhere except in the first usage).

  • In a subscript/superscript.

  • In a URL (directory name, filename, or domain name). Do not duplicate the keyword in the URL.

  • In an image filename used on the page.

  • In the ALT tag of the image.

  • In the title attribute of the image.

  • In link text to another site.

  • In an internal link's text.

  • In the title attribute of all the targeted links in and out of the page.

  • In the filename of your external CSS (Cascading Style Sheet) or JavaScript file.

  • In an inbound link on the site (preferably from your home page).

  • In an inbound link from off the site (if possible).

  • In a link to a site that has a PageRank of 8 or better.

Other search engine optimization issues to consider include:

  • Use "last modified" headers if you can.

  • Validate the HTML. Some feel that Google's parser has become stricter at parsing instead of milder. It often misses an entire page because of a few simple errors; we have tested this thoroughly.

  • Use an HTML template throughout your site. Google can spot the template and parse it off. (Of course, this also means it is pretty good at spotting duplicate content.)

  • Keep the page as an .html or .htm extension. Any dynamic extension is a risk.

  • Keep the HTML below 20 KB; 5 to 15 KB is the ideal range.

  • Keep the ratio of text to HTML very high. Text should outweigh HTML by a significant amount.

  • Double-check your page in Netscape, Opera, and Internet Explorer. Use Lynx if you have it.

  • Use only raw hrEFs for links. Keep JavaScript far, far away from links. The simpler the link code, the better.

  • More traffic will come once you realize that 1 referral a day to 10 pages is better than 10 referrals a day to 1 page.

  • Don't assume that keywords in your site's navigation template are worth anything at all. Google looks for full sentences and paragraphs. Keywords just lying around orphaned on the page are not worth as much as when they are used in a sentence.

Keeping Up with Google's Changes

With Google having such a leading position in the search engine world and so many webmasters looking to Google for traffic, you might guess that there's a lot of discussion about Google in various places around the Web. And you'd be right! My favorite place for Google news and gossip is Webmaster World (http://www.webmasterworld.com). It's not often that the terms "civilized" and "online forums" go together, but they do in this case. Discourse on this site is friendly, informative, and generally flame-free. I have learned a lot from this site.

There are also a few blogs devoted to Google and searching in general:

  • Google Blog (http://googleblog.blogspot.com) is the official Google blog and features announcements, pointers, and behind-the-scenes commentary from the Googleplex.

  • Googler Matt Cutts maintains a blog called Gadgets, Google, and SEO (http://www.mattcutts.com/blog/). While he doesn't always speak on behalf of Google, he provides insights into Google you won't find anywhere else. Matt discusses sites that have been recently banned for rule violations, new Google features relevant to search engine tuners, and tips for webmasters who want to play nice with Google.

  • John Battelle's Searchblog (http://battellemedia.com) covers every conceivable kind of search.

  • Google Webmaster Help Center (http://www.google.com/support/webmasters/) should be your first stop to look up official Google policies, find frequently asked questions, and read the latest about Google Sitemaps.

Tools for Webmasters

Google doesn't simply leave you to fend for yourself in the Wild Wide Web. Google offers a number of tools that can help you understand how Google sees your site, visualize traffic at your site, advertise with Google, and make money with your site by placing ads for Google. Here is a quick look at the tools that can help you in your quest for top Google placement.

Google Sitemaps

The primary tool that Google offers for webmasters is Google Sitemaps (http://www.google.com/webmasters/sitemaps/). Sign up and register your site, and you'll receive detailed reports about when Google last crawled your site, and any errors Google encountered in the process.

In addition to detailed information about Google's last visit to your site, you'll find reports about your site's ranking in Google's index for various keywords. Sitemaps also features a tool to analyze your robots.txt file to verify that you're keeping the Google bot out of your site's private sections.

The tool's namesake is the Sitemap, a text file you can place on your site that directs the Google bot to recently updated content. Instead of randomly checking your pages, a Google Sitemap tells the Google bot exactly where to go when it visits your site. The Sitemap itself is an XML file that uses Google's Sitemap format (https://www.google.com/webmasters/sitemaps/docs/en/protocol.html) to describe pages at a web site. Google even offers a free Python script called google-sitemap_gen (http://sourceforge.net/projects/goog-sitemapgen/) to help you generate a Sitemap automatically. There are a number of third-party plug-ins (http://code.google.com/sm_thirdparty.html) that can help you automatically generate a Sitemap and keep the Google bot informed of changes to your site.

Google Analytics

Google Analytics (http://www.google.com/analytics/) is a free web traffic analysis tool that helps you visualize your site's traffic. Instead of relying on web logs generated by your server, you can place a few lines of JavaScript on every page of your site and let Google track your traffic. Google Analytics rivals most of the web log analysis tools available, and it can break down your traffic into a number of segments for review.

shows the traffic overview you're greeted with when you log in. It includes a weekly summary of visits and pageviews, a geographic summary of where visitors came from, graphs with the source of the visit, and a look at new versus returning visitors.

Google Analytics is integrated with Google AdWords to help you track the success of advertising campaigns. But you don't need to be an AdWords customer to take advantage of the tool.

Google AdWords

Google has built its financial empire on its ability to provide relevant ads to a receptive audience. Google knows that simply grabbing eyeballs isn't enough. It's the click-throughclicking an ad and following it to the advertiser and its productsthat counts. This is where Google's AdWords really shine. They're not simply rotating, flip-of-the-coin ads; they're every bit as relevant as the results of your search.

Query Google for "volvo safety", and Car Safety Report ads from Edmunds (http://www.edmunds.com) and insurance quotes from auto insurance providers appear alongside the Volvo safety reports and crash tests. Try pirates, and you'll be served (at least at the time of this writing) a Major League Baseball ad. What does MLB have to do with pirates, you ask? Well, Major League Baseball purchased an ad for the keyword because it thinks you might be looking for information about the Pittsburgh Pirates baseball team. As of this writing, you'll also see an ad for Disney's Pirates of the Caribbean. If Google has nothing relevant to show, it shows no ads at all.

As an advertiser base, AdWords is hundreds of thousands strong. Mom-and-pops to Fortune 500s are all looking to make their presence known and their wares available alongside Google search results and on thousands of sites across the Web.

In true Google style, AdWords is different from just about every advertising service you've ever seen. There's virtually no price barrier; anyone with a few marketing dollars in their pocket can buy a few keywords. Everything is handled through the AdWords site; you don't have to speak to a Google advertising executive to start your campaign. It's so simple that even the most inexperienced marketer can get a leg up. That said, there's a lot to AdWords, and its simplicity can be deceptive.

Google AdSense

Google AdSense (http://www.google.com/adsense) is Google's advertising service, designed to deliver advertising magic to your web site. With hundreds of thousands of advertisers signed up, there are sure to be ads that target your readers, whether your site is about baseball, computers, or rare-spoon collecting.

Sign up, choose the shape and size of the ads you want to display, copy some code, paste it into your site, and as your readers click the ads, earn money. Of course, it's not quite that simple; you need to focus on gathering readers and keeping them coming back for more.

The Google AdSense site provides detailed statistics of the number of ads shown and the number of ads clicked.

There are a number of types of ads you can show on your site. In addition to the standard banner and text ads, you can provide Referral buttons that point people to Google products. You can also provide a Google Search box for your readers with the option to earn money in the process.

The Equally Mysterious Ranking Algorithm

If you thought Google was tight-lipped about how it determines PageRank, it's an absolute oyster when it comes to the ranking algorithm, which is how Google determines the order of search results. This blog can give you some ideas about how the algorithm works, but again, these ideas are conjecture, and the algorithm is constantly changing. Your best bet is to create a content-rich web site and update it often. Google appreciates good content.

Of course, being listed in Google's index is not the only way to tell visitors about your site. You also have the option to advertise on Google.

The Mysterious PageRank

You might hear a lot of talk about Google's PageRank, people bragging about their sites attaining the misty heights of rank 7 or 8, or speaking reverently of sites that have achieved 9 or 10. PageRanks range from 0 (sites that have been penalized or not ranked) to 10 (reserved for only the most popular sites, such as Yahoo! and Google itself). The only place where you can actually see the PageRank of a given URL is in the Google Toolbar, though you can get some idea of its popularity from the Google Directory. Listings in the Google Directory have a green bar next to them, which reflects a listing's popularity without giving an exact number.

Google has never provided the entire formula for its PageRank, so all you will find in this book is conjecture. It wouldn't surprise me to learn that the formula is constantly changing; as millions of people try myriad methods to increase their page ranking, Google has to take these efforts into account and (sometimes) react against them.

Why is PageRank so important? Because Google uses it as one aspect of determining how a given URL ranks among millions of possible search results. Still, it's only one aspect. Other aspects are determined via Google's ranking algorithm.

Google's Importance to Webmasters

But isn't Google just a search engine web site like any other? Actually, its reach is far greater. Google partners with other sites to use the Google index results, including the likes of heavyweight property AOL. Google is also on the multitude of sites out there that use the Google API, advertise through Google, or even display ads for Google. So when you think about potential visitors from Google search results and advertising, you have to think beyond traditional search site borders.

Google's perception of your site has become increasingly more important, which means you have to make sure your site abides by Google's rules or it risks not being picked up. If you're concerned about search engine traffic, you have to make sure that your site is optimized for luring in Google spiders and that it's indexed effectively. And if you don't want Google to index certain parts of your site, you need to understand the ins and outs of configuring your robots.txt file to reflect your preferences.

Webmastering

When the Web was younger, the search engine field was wide open. There were lots of major search engines, including AltaVista, Excite, HotBot, and Webcrawler. This proliferation of search engines had both advantages and disadvantages. One disadvantage was that you had to make sure you submitted your query to several different places, while one advantage was that you had several inflows of traffic spawned from search engines.

As the number of search engines has dwindled, Google's index (and influence) has grown. You no longer have to worry so much about submitting to different places, but you do have to be aware of Google at all times.

Beyond Google for News and Blogs

After a long dry spell, news and blog-related search engines have popped up all over the Internet. Here are my top four:

Rocketinfo (http://www.rocketnews.com)

Does not use the most extensive sources in the world, but lesser-known press release outlets (such as PETA) and very technical outlets (e.g., OncoLink, BioSpace, Insurance News Net) can be found here. Rocketinfo's main drawback is its limited search and sort options.

Yahoo! Daily News (http://news.yahoo.com)

Unlike Google News, Yahoo! relies on human editors to assemble its news portal. A 30-day index means that you can sometimes find things that have slipped off the other engines. Yahoo! Daily News provides free news alerts for registered Yahoo! users.

Technorati (http://www.technorati.com)

Technorati can help you zero in on conversations within the blogosphere. Many blog authors tag their posts with keywords to help Technorati determine how its posts should be categorized, and you can search for posts by tag.

BlogPulse (http://www.blogpulse.com/)

BlogPulse is geared toward tracking trends across blogs. You can use its Trend Search tool to graph the frequency of mentions of words or phrases across blogs.

Blogs

On the surface, weblogs (or blogs for short) are simply a format for publishing information online by placing new information at the top of the page. But dig a little deeper, and you realize that blogs have changed the way people communicate and consume information.

At the time of this writing, the blog-tracking service Technorati (http://www.technorati.com) estimates that 75,000 new blogs are created every day; over 35 million blogs are already in its index. This global network of blogs (often called the blogosphere) shows no signs of stopping, and Google offers some specialized tools to help you tune in and take part.

Blogger

To start publishing in the blogosphere, look no further than Blogger (http://www.blogger.com).

Blogger is a free service that provides everything you need to start writing a blog, including web-hosting space. The signup process literally takes less than five minutes, but don't let its simplicity fool you. With Blogger, you can start multiple blogs, post by email, customize your blogs' designs, collect comments on posts from readers, and publish your blog to a remove site via FTP or Secure FTP.

Blogger.com provides a simple posting interface where you type your rants, raves, opinions, or news into a form. Click Publish Post, and your words are on the Web.

Google Blog Search

Google recognized that blogs are a bit different from standard web sites, so it created a search engine specifically for finding news and commentary on blogs. The Google Blog Search is available at both Blogger (http://blogsearch.google.com) and Google (http://blogsearch.google.com), but both faces use the same index in the background.

Instead of searching the open Web for content, the Google Blog Search finds content in XML news feeds. Because of this, any blogs that don't also publish a news feed are not included in the Google Blog Search index. Also, Google started collecting content for the index when it launched in late 2005, so the index goes much further back in time.

It's also important to note that the Google Blog Search results in page returns that Google feels are the best matches for a particular query. But timeliness is a key aspect of blogs and could be to your search as well. Click Sort by date at the top of the results page to see search results listed from newest to oldestlike a blog!

Google Blog Search Syntax

Use Google Blog Search just as you would Google News Search. You can use the standard Google search syntaxes such as site: or intitle: to refine your searches. There are also a few special search syntaxes unique to Blog Search:

blogurl:

This searches a specific blog by including its URL, like this:

blogurl:radar.oreilly.com google

inblogtitle:

As you'd expect, this limits a search to blogs with the specified word in its title:

inblogtitle:ipod battery

This example searches for the word "battery" among blogs with the word "ipod" in their title.

inposttitle:

Searching in post titles can be useful when you want to narrow your search to specific topics. Post titles often include keywords related to the content of a post:

inposttitle:ipod iTunes video

inpostauthor:

This filters posts by an author name, which can be handy if you know who wrote something but can't remember where you read it:

author:paul hacks

This query finds posts that use the word "hacks" by people named Paul. Keep in mind that not every blog publishes author information along with each post, so the results are limited to just those blogs with author info.

You can always skip the special syntax and head over to the Blog Search Advanced Search page (http://search.blogger.com/advanced_blog_search) to perform these and other specialized searches such as finding posts within a date range.