Website Design London and Essex

robots.txt

HomePortfolioVideoSEO Website PromotionNewsletterAbout UsContact Us SEO training courses

 

What is a robots.txt file?

A robots.txt file. Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don't add much value for users coming from search engines. This file tells crawlers which directories they can or cannot crawl. Make sure it's current for your site so that you don't accidentally block the robots.

This file is placed on the root directory of your site. The following can be found at the top of Google.com's robot.txt file (http://www.google.co.uk/robots.txt)
User-agent: *
Allow: /searchhistory/
Disallow: /news?output=xhtml&
Allow: /news?output=xhtml
Disallow: /search
Disallow: /groups
Disallow: /images
Disallow: /catalogs
Disallow: /catalogues
Disallow: /news
    User-agent: * indicates that the information is for all robots.

    Disallow: /news  tells the robots not to index any page in the /news directory.

If you have 90,000 pages on your Web site then it is unlikely that a search engine will index them all. You may find that those that are critical are excluded in preference to insignificant pages.

How to check your robots.txt file

Open your web browser and enter www.yourdomain.com/robots.txt to view the contents of your robots txt file. Here are the most important tips for a correct robots.txt file:

1.  There are only two official commands for the robots.txt file: User-agent and Disallow. Do not use more commands than these.

2.   Don't change the order of the commands. Start with the user-agent line and then add the disallow commands:

User-agent: *
Disallow: /cgi-bin/

3.   Don't use more than one directory in a Disallow line. "Disallow: /support /cgi-bin/ /images/" does not work. Use an extra Disallow line for every directory:

User-agent: *
Disallow: /support
Disallow: /cgi-bin/
Disallow: /images/

4.   Be sure to use the right case. The file names on your server are case sensitve. If the name of your directory is "Support", don't write "support" in the robots.txt file.

You can find user agent names in your log files by checking for requests to robots.txt. Usually, all search engine spiders should be given the same rights. To do that, use User-agent: * in your robots.txt file.

What happens if you don't have a robots.txt file?

If your Web site doesn't have a robots.txt file (you can check this by entering your www.yourdomain.com/robotx.txt in your web browser) then search engines will automatically index everything they can find on your site.

First Website Design .com aims to help you build an affordable high quality website that will receive large volumes of targeted traffic . It will provide you with advice on the best Internet marketing strategies that will assist you to sell products and ways to make money online.

For information on search engine optimisation, please visit our specialist site - Keyword SEO Pro .com

David Viniker MD, the author and webmaster of FirstWebsitedesign.com, believes that quality of content is a prerequisite to success. . He has research and teaching interests. His website www.2womenshealth.com receives 1.5 million visitors annually and is the most popular personal women's health website on the Internet. He has applied his clinical skills to researching SEO techniques.

We are based in Loughton, Essex and close to North and East London and Hertfordshire.