Home Guide Tools Listing News Background Search About Us

SearchTools.com

Test of Robots Rules


Robots.txt

Robots.txt is a standard file allowing webmasters to control which directories are available for web robots: for more information, see the Robots.txt Guide Page.

To test how well a robot obeys Robots.txt, we made a link to a page that our robots.txt file indicates should not be indexed. In this case it's the robots-test file in the /test/robots/disallow subdirectory. The robots.txt file for this site includes this line:

User-agent: *
Disallow: /test/robot/disallow/

Any robot that indexes the pages in this directory is disobeying this rule.

Any search engine that indexes it will have the term R Test 101 (without spaces). It is disobeying the robots.txt directive

Robots META Tag

In addition to server-wide robot control, web page creators can also specify that certain pages should not be indexed by search engine robots, or that the links on the page should not be followed by robots, using the Robots META tag.

The following pages test whether search indexing robots correctly obey the commands in the Robots META tag.

X-Robots-Tag

The X-Robots-Tag contains the Robots META tag values and can be inserted into an HTTP response header by a script, application, CGI, Apache .htaccess, or any other automated web response tool. This means the site can control the robots access and indexing of non-HTML documents, and automate instructions about all pages without changing the content. For more information, see my X-Robots-Tag page.

Page Updated 2008-07-10

Home
Guide
Tools Listing
News
Background
Search
About Us
SearchTools.com - Copyright © 2000-2008 Search Tools Consulting
This work is provided under a Creative Commons Sampling Plus 1.0 License.