Robots.txt – What Are They & Why We Need Them…

by ImBizJourney

Every wordpress blog should have a Robots.txt file. Other websites might also need a robots.txt file but in this post I am going to concentrate on robots.txt for the wordpress blogs.

But Robbie...
Creative Commons License photo credit: H Dickins
What Is Robots.txt File ?

This is a text file that you put on your website to tell the search engines which pages they should not visit.

You should place this file in the main directory of your website – otherwise the search engines will not find it.


Why Do We Need Robots.txt
?

I love wordpress and use it for all my websites – even for non blog websites. Apart from the ease of use, hundreds of themes, hundreds of plugins, pages and pages of help, wordpress blogs are also very well optimized for Search Engine Optimization purposes.

But one problem with wordpress blogs is the existence of duplicate content. We don’t want Google to keep indexing duplicate content – this can affect your rankings.

And the way to avoid this duplicate content indexing is with the help of a robots.txt file.

The problem does not stop with duplicate content – there is a folder with my images, another folder with my affiliate files and then wordpress login files etc etc – I don’t want any of these folders/files indexed.

So, a well optimized wordpress blog MUST have a robots.txt file.


My Robots.txt File

Here are the contents of my file:

Disallow: /wp-login.php
Disallow: /wp-register.php
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /images/
Disallow: /hop/
Disallow: /page/
Disallow: /2006/
Disallow: /2007/
Disallow: /2008/
Disallow: /2009/
Disallow: /2010/
Disallow: /error/
Disallow: /*?*
Disallow: /wp-content/
Noindex: /hop/
Noindex: /tag/
Noindex: /wp-includes/
Noindex: /page/
Noindex:/wp-login.php
Noindex: /wp-content/

‘hop’ is the folder where I have all my affiliate links. You might want to replace this name with whatever folder name you are using for your affiliate links.

There is a difference between NoIndex and Disallow. Disallow stops the Googlebot from accessing the page entirely, whereas NoIndex lets it access the pages, follow the links on it but not index the page itself.

Now here comes the good news – you don’t need to understand everything about robots.txt. All wordpress blogs have similar file contents. So get one file done and copy them across (except for a few folder names).

That’s it for robots.txt files – Now go and make sure all your wordpress sites have one and if not create one now…

Best,

One Comment

Trackbacks & Pingbacks

  1. Cloak Affiliate Links - Redirect Merchant Links

Leave a Reply

Note: XHTML is allowed. Your email address will never be published.

Subscribe to this comment feed via RSS