Search Indexing Robots: Books and Articles

InfoSpiders: Adaptive Retrieval Agents Choosing Heuristic Neighborhoods for Information Discovery (ARACHNID) October, 2001
University of Iowa work on issues of intelligent agents and adaptive spiders. Examples as Java Applets.
White Paper : The robots.txt file and the robots meta tag SearchMechanics /, September 2000.
Practical descriptions for the webmaster on how the robots instructions are treated by search engine robots and other crawlers.
Programming Bots, Spiders and Intelligent Agents in MS Visual C++ David Pallmann, Microsoft Press, 1999
Provides context for proper use of robots on the Web, C++ and MFC examples for various kinds of agents, including site-indexing, advanced topics include multithreading, adaptation, logging, notification, etc. Knowledge of network programming and Internet protocols not required: relies on waning and MSIE heavily. Get the book from Amazon and give this site the affiliate fee.
Mercator: A Scalable, Extensible Web Crawler World Wide Web, volume 2 (1999), number 4 (December) by Allan Heydon and Marc Najork
Describes the design and architecture of a scalable multi-server robot crawler, modularization, including filtering by type, extracting links, queuing, testing for duplicates, domain name resolution and alias host names, testing for multiple links to the same page, threading and synchronous I/O, session IDs, and more.
Robots and Spiders and Crawlers Ultraseek White Paper, September 1999
Detailed discussion of how search engine indexing robots follow links and read Web pages to store the information in search indexes. Includes coverage of problem areas such as image maps, frames, JavaScript and dynamic data. Notes describe how the Ultraseek Spider handles these problems.
Controlling Search Engines ZDnet devhead / Interactive Designer, January 25, 1999
Nice article about using META tags.
Brace Your Site for the Onslaught of Bots ZDnet devhead, November 1, 1997 by David S. Linthicum
Information for Web site managers about site-spidering robots, including IE 4's subscription bot and robots.txt.

