|
|
|
|
|
|
|
There are two ways that web page creators can control robots, and this page helps to test how well they work.
For more tests, see the List of Robot Tests.
Robots.txt is a standard file allowing webmasters to control which directories are available for web robots: for more information, see the Robots.txt Guide Page.
To test how well a robot obeys Robots.txt, we made a link to a page that our robots.txt file indicates should not be indexed. In this case it's www.searchtools.com/test/robots/no-index/robots-tests.html. The robots.txt file for this site includes this line:
User-agent: * Disallow: /test/robots/noindex/Any robot that indexes the pages in this directory is disobeying this rule.
To test this, search on the unique test term on the robots.txt test page, so you can be sure that if it's missing, the seach engine robot is obeying the rules.
In addition to server-wide robot control, web page creators can also specify that certain pages should not be indexed by search engine robots, or that the links on the page should not be followed by robots using the Robots META tag.
The following pages test whether search indexing robots correctly obey the commands in the Robots META tag.
NoIndex page: this page should not be indexed, but the link from it to the meta-follow page should work.
- NoIndex and NoFollow page: this page should not be indexed, and the robot should not follow the link from it to the meta-noindex-nofollow either.