As of January, 2012, this site is no longer being updated, due to work and health issues

Search Tools Reports

Searching for Text Information in Databases

Databases provide the content storage for many sites, which dynamically create web pages around them, including ecommerce catalog sites, online news, and even entertainment sites. Intranets often contain large amounts of text stored in databases as well.

These databases generally have their own search functions, which may appear to take the place of a full-text search engine. But that's not always the case. Database search is not oriented towards text search and relevance ranking: it is great for locating widgets by part number and listing the inventory of leather slippers, but not so good at helping site visitors find the articles on widget quality or comparing leather and fleece slippers. Text search engines are often the right solution, moving the processing load from the database to the search engine.

Database search vs. text search

Searching Many Fields and Tables

Databases store their information neatly organized into fields, such as product name, category, description, price and so on. However most people don't like to choose a field before searching: who knows whether widget is in the name, category, or description field? While databases can set up complex queries to find the search words in all applicable fields, this makes them slower to respond, requires more memory, and is more difficult to program. Text search engines store this information in a single index and can find words in any field for a record. Many high-end search engines can also store field information, so searches can be limited to a specific field as well.

Simple Search Commands

In many cases, users must type in complex Boolean or SQL (Structured Query Language) commands for searching. Some databases, such as MySQL, are limited to "Or" searches -- they will return all records which match any of the search terms. Others default to exact phrase matching, so a search for fuzzy slippers would not find slippers, fuzzy.

While some programmers and librarians enjoy the control that query languages give them, most people do not. This is particularly important for searching multiple words: if someone types in brown bear, they probably want to see records with bear with a brown coat in them as well, without typing bear AND brown. Search logs show clearly that most people will not do anything complicated while searching -- they will give up if the search is too hard to use. Full-text search engines offer simple and flexible search options, with most providing an Advanced Search feature for power users.

Flexible Search Processing

Database search functions tend to look for exact matches in capitalization and characters. If someone searches for pokemon, they won't find records with Pokemon or pokémon in them. Many text search engines will automatically match lowercase searches with any text, and will adjust for extended and diacritical characters. Some also allow search administrators to set up synonyms for searching (doctor and physician), and automatically perform stemming (find octopi when searching for octopus). A new trend in text search engines is to incorporate spelling checkers, which are particularly useful for queries which do not find any matches. By checking the words already in the index, the search engine spelling checker can offer suggestions which apply to this particular content, which is extremely useful.

Some databases even return a list of everything they have if a user enters an empty search string -- this requires significant server processing time. Search engines tend to just report that they found no matches.

Database Structure

Many databases are not designed for easy searching. In a wine database, for example, it may be difficult to find wines from a certain region, such as the Napa valley of California, because the location is stored in a table far away from the wine names. Some text search engines, such as dtSearch, FAST Data Search, Ultraseek, and Verity K2, provide a rich data set for searching. Mercado's IntuiFind provides options to entirely rearrange a database structure, if that's what's necessary for searching.

Response Time and Database Resources

Databases are optimized to search for exact words and phrases, and they tend to respond very slowly otherwise. So if a searcher wants to find sheepskin or shearling in the same search, databases will tend to do two searches and then merge the results. Full text search engines are designed to store these words in a single index, so they can perform these kinds of searches efficiently and return quickly.

In addition to the time, searching databases requires additional back-end and server resources. By storing a search index on a separate server and searching that rather than the live data, a text search engine can perform queries without additional demands on the database itself.

Results Per Page

Many database search engines will happily display all results on the same page, whether there are 8 or 8,000 records. Text search engines have a mechanism for dividing up the results and providing navigation from the first page to following pages (and back).

Sorting by Relevance

Text search engines sort by relevance, as determined by the number and location of matched words in the result page or record. Database search functions sort by size, or price, or date, or the order in which the items were entered in the database!

Many text search engines can sort results by date as an option, and a few of them can sort by price, size, geographic location, etc.

Connecting Full Text Search Indexer to the Database

To add a text search engine to a database-generated site, the engine can connect to the database directly, using SQL, ODBC, JDBC or a native connector. This is efficient because it reduces the overhead of going through the HTTP server and creating a new session for every record. It also contains only the content text, rather than navigation links, copyright information and other inappropriate content. In some cases, the database can track new and changed records, and only give this new data to the search engine indexer, on demand or as part of an application interface.

If there is no way to connect directly to the database, a search engine spider can crawl the pages generated from the database, viewing the pages as a browser would. In some cases, this can be a special crawl for pages designed just for indexing, whether detected by the server or as a special set of URLs that are then converted to user-viewable URLs. Again, the database should provide only new and changed records, rather than all records every time.

Tools for Extracting Database Content

Search Engines with Database Interfaces

Databases with Search Engine Interfaces

These databases incorporate a full-text search function in some way.

Page Updated 2003-07-23