As of January, 2012, this site is no longer being updated, due to work and health issues

X-Robots-Tag: Robot Instructions in the HTTP Header

In the Robots Exclusion Protocol June 08 Agreement, the leading webwide search engines announced that they would recognize a new element in the HTTP header, the X-Robots-Tag. Google started using it at first, then Yahoo and now Microsoft Live Search is supporting it.

When a browser or robot sends a request to the web server for a URL, part of the response is the invisible HTTP header, including information about the file type, encoding, and date modified. This information is generated by the web server.

The new X-Robots-Tag, within the HTTP response header, can contain same values as the Robots META tags: NOINDEX, NOFOLLOW, NOARCHIVE, NOODP, NOSNIPPET.

There are several cases where the X-Robots-Tag values will be very valuable:

This is not something anyone can type in by hand, but it's easily added by programmatically by server-side tools such as Perl, Ruby, or PHP. For simple cases, the Apache .htaccess file is easy enough to configure, as in this example where the crawler is told not to index content in robots.txt:

<FilesMatch "robots\.txt">
Header set X-Robots-Tag "NOINDEX, FOLLOW"

or to avoid following links in".doc" files

<FilesMatch "\.doc$">
Header set X-Robots-Tag "NOFOLLOW"

I think this is a very clever way to add the known functionality of Robots META tags to non-HTML file formats, collated from an external metadata repository. It's likely to be particularly useful to intranet search engines, and portals which may not have access to the documents themselves.

Related Topics has a very useful HTTP Server Response Code Checker, and Firefox will show x-robots-tags with Tools / Page Info.

I have added an X-Robots-Tag test suite to the SearchTools testing section and will report if I find anything interesting.

H/T to: Playing with the X-Robots-Tag; Controlling Your Robots; Handling Google's neat X-Robots-Tag

Page Updated 2008-07-10