Controlling Your Content Using a robots.txt File
Part of optimizing a site for organic search is deciding what pages you want the search engine robots to index. Why would I not want my web pages indexed by the search engines, you ask? Well, there are a lot of reasons why. The real answer depends on whether or not you want all your pages to be found on the search engines. Regardless, this can be easily controlled via a file called "robots.txt".
Creating a robots.txt file is a relatively simple process and you don't need to be a certified webmaster to create one either. Name the text file "robots.txt" using any text editor and place the robots.txt file in the root directory of your domain or sub-domain.
A few rules to remember about robots.txt files.
1. Adding a robots.txt file to your sub-directories will not work
2. You will need to add robots.txt file for each sub-domain that you manage
3. You will need a separate robots.txt files for your secure (https) and nonsecure (http) pages
4. Each robots.txt file must contain at least two lines to be effective:
* User-agent: the robot the following rule applies to
* Disallow: the URL you want to block (this entry should begin with a forward slash "/")
Below are a few examples:
* Allow all robots to visit all files: User-agent: * Disallow:*
To block the entire site: Disallow: /
* To block a directory and everything in it: Disallow: /junk-directory/
* To block a page, list the page: Disallow: /private_file.html
* To block files of a specific file type (ex..gif): Disallow: /*.gif$
Clear as mud? Not to worry. Just having a robots.txt file is a step in the right direction.