SearchTools.com

Tests of Robots Following Rules

Robots.txt

Robots.txt is a standard file allowing webmasters to control which directories are available for web robots: for more information, see the Robots.txt Guide Page.

To test how well a robot obeys Robots.txt, we made a link to a page that our robots.txt file indicates should not be indexed. In this case it's the robots-test file in the /test/robots/disallow subdirectory. The robots.txt file for this site includes this line:

User-agent:* 
Disallow: /test/robots/disallow/

Any robot that indexes the pages in this directory is disobeying this rule.  Any search engine that indexes the page will have the term R Test 101 (without spaces), but it is disobeying the robots.txt directive.

Note that the robots controls only apply to cooperating robots: malicious ones may ignore the disallows or even use them as directions to interesting data.  The only way to be sure that robots will not index a page is to use access controls, for which, see our password protection tests.

Robots META Tag

In addition to server-wide robot control, web page creators can also specify that certain pages should not be indexed by search engine robots, or that the links on the page should not be followed by robots, using the Robots META tag.

The following pages test whether search indexing robots correctly obey the commands in the Robots META tag.

X-Robots-Tag

The X-Robots-Tag is a relatively new concept. It contains the Robots META tag values and can be inserted into an HTTP response header by a script, application, CGI, Apache .htaccess, or any other automated web response tool. This means the site can control the robots access and indexing of non-HTML documents, and automate instructions about all pages without changing the content. For more information, see my X-Robots-Tag page.

Page Modified: 2011-01-13