Home Guide Tools Listing News Info Search About SearchTools

As of January, 2012, this site is no longer updated, due to work and health issues.

Guide to Search Tools

Distributed Search Systems

Distributed Search systems share indexes and/or query traffic among multiple servers, allowing more simultaneous searches without reducing responsiveness.

For a single search interfaces for multiple query servers, see Federated Search; for equal distribution of search engines, you may want to read Peer-to-Peer Search.

Index Replication & Query Distribution

In this approach, the master system creates the index and associated files, handles updates, and performs housekeeping such as optimization. This server just takes care of the index and doesn't respond to user queries, so it doesn't slow down search.

Periodically, the master clones the index for the secondary "slave" servers.  If part of the index is a secondary update index, it should be included in the clone.  This approach is best for systems with high traffic but data that has only small changes or where changes are rare.

Round-robin - a switcher sends each new query to the next search engine server

Load-balancing - tracks server load, either externally using ping and other measures, or internally by communicating with the server, then sends each new query to the server which can process it quickest.

 

Index File Distribution (Sharding)

This scales the searchable beyond the capacity of a single server, breaking the index into pieces and putting those pieces on separate servers.  For inverted indexes (alphabetical word lists with source links), the shards are based on the word spelling, so the .  This makes searches queries very fast, as the query terms are only run against one index server each.

 

http://en.wikipedia.org/wiki/Shard_(database_architecture)

 

Page Created 2011-2-1