Home Guide Tools Listing News Info Search About SearchTools

As of January, 2012, this site is no longer updated, due to work and health issues.

Guide to Search Tools

Federated Search Systems

Federated search provides a user interface that sends a defined query to several separate search servers, then accepts and displays the structured results. This is the only way to get information from external sources such as a government patent office, Nexis/Lexis, or corporate databases that can't be crawled. It also reduces the need to crawl and index rarely-used sources, and keeps the index size smaller.  Because the query to the content server is dynamic, it's always current on security: users can only see the content that they have rights to access.

However, federated search administrators need to create and maintain query translators from the user-oriented search to the source query language, including passing credentials. They are dependent on the response speed of the servers they are querying, and they have to decide whether to interleave results from multiple servers which may have wildly varying relevance scoring systems.  Some target databases have overlapping content indexed, in which case deleting the duplicate results is another task.

Alternatives to Federated Search:

Metasearch: uses connector code to send a query as a web browser client and screen-scrapes the results. This is necessary when legacy systems only have an HTTP interface or are too expensive to adjust, or external content providers decline to add an API (see the SearchTools Meta Search Report).

Aggregated Indexing: attempts to index every possible piece of text, using HTTP crawlers for web or intranet pages, file server indexers, and connectors to content management systems, databases, legacy systems and other data silos.

Many enterprise search installations will use several approaches at once, especially at ever-larger scale. In all cases, security and access control is a big problem and there are no easy answers.

Federated Search Protocols

Z39.50 standard was developed for library catalogs and adapted for some databases.  It was awkward and state-ful, so the system would not return until the slowest server replies.

SRW/U standard (Search/Retrieve WebService/URL) was created around 2005 to replace Z39.50 as a library-oriented query protocol.

OpenSearch - a protocol developed by Amazon for A9, it's mostly used for choosing which search engine to use in browsers.

Federated Search Resources

Federated Search Vendors

Avi's Definition of an Ideal Federated Text Search Protocol

Page Created 2011-02-01