Enterprise Search 101


Avi Rappoport

Search Tools Consulting


About Avi Rappoport

Defining Enterprise Search










Search as an Iceberg






Similarities to Webwide Search

Differences from Web Search




Search and Information Architecture

Search and Taxonomy Categorization

Search vs. Knowledge Management








Simple Search vs. Research

Text Search Engines vs. Databases

Elements of an Enterprise Search Engine








Choosing Content To Index

Security and Access Control

Many online newspapers and magazines provide search results for articles which are available for purchase or accessible by paid subscription. These include The New York Times and Consumer Reports.

Finding Content








How Robot Spiders Work

example of spider

Diagram by James Ghaphery, VCU








Common Problems With Robots








Other Data Sources

File Format Issues

Indexing - Getting Data




Index & Search Multiple Languages

Index Structures and Content

Inverted Index Diagram

Indexes: Document Information

The Dublin Core working group agreed on 15 tags for tracking and cataloging web pages and creating metadata. These are: Title, Creator, Subject, Description, Publisher, Contributor, Date, Type, Format, Identifier (URL), Source, Language, Relation, Coverage and Rights (copyright information). For more information, see http://dublincore.org/

Complex Index Format





Indexing Multimedia

Indexing: Dates


Indexing: Stopwords


Indexing: Stemming






















The Search Process

search process

Query Processing

More Query Processing

Search Retrieval, Recall & Precision

How Retrieval Works

Defining Relevance Ranking

Highly recommended for Relevance and other issues: Modern Information Retrieval Ricardo Baeza-Yates & Berthier Ribero-Neto, Addison-Wesley Longman, May 1999, ISBN 020139829x, $55


More on Relevance



Visible Search: Search Forms

example search in nav bar











Special Search Interfaces








Compare with simple text search results:

text search example






Advanced Search

Some Advanced Search Makes Sense

ebay exa,[;e












Search Results Page: Overview







Search Results: Visualization









Search Results Header

Results Header Example - Good












Search Suggestions

Search Suggestions Example

nordstom armani example








Result Items Content

Results Items Example

From New York Magazine

new york magazine example

Notes: In this example, the results items include the titles as clickable links, author name, page description, section (sometimes), date.










More Information in Results


Consumer Reports Results Example

consumerreports.org example

From consumerrports.org, results items include: Titles (sometimes duplicates), dates (sometimes) in the titles, text surrounding bolded match terms, category. Note the Free flag: the other results are shown as encouragement to subscribe.







Commerce and Catalog Results

Online Store Results Example

macys result for green jacket

from macys.com









Showing Multimedia Results

Multimedia Results Example

pbs video search reults

The PBS.org NewsHour search results show metadata, a frame from the video, and a dropdown menu containing that section of the transcript. The search engine is from OnStreamMedia.com.











Faceted Metadata Search and Browse

Faceted Metadata: Commerce

Nordstrom.com - Note that the facets on the left side include category and color; further down are size, price, and brand. Each of them has a preview number, so it's clear how many items will be there when the user clicks.





Faceted Metadata: Library Catalog

ncsu library results

The North Carolina State University Library online catalog: http://www.lib.ncsu.edu/catalog/













Empty Queries

Search Failure: No Hits

Bad Example of No-Matches Page


It has the site navigation, but no search field in the center, no hints or tips, no spelling suggestion...


Addressing Search Failure

Good example of No-Matches Page

verisign no-matches








Choosing a Search Engine

  1. Research the information needs
  2. Analyze content on web servers and databases
  3. Define scale, platform, system compatibility requirements
  4. Compare several likely prospects
    • Indexing tools
    • Retrieval and relevance
    • User-oriented features: spellchecker, synonyms, suggestions
    • Administration tools
    • Special functionality
    • Price

Note: Buy, don't build, unless you have a unique need.

Information Needs Analysis

Content Survey




Types of Search Engines

  • Notes:
    • Remote search services fall in the Software as a Service (SaaS) category.
    • Open-source search engines usually concentrate on robot indexing and relevance ranking features

Testing & Evaluating Search Engines

Note on matching issues:

  • Many matches - e.g. name of product or company
  • Few matches - technical terms
  • No matches - force with nonsense such as "zxcvbnm"






Scaling Search

Search Engine Maintenance

Metrics for Search Engines










Search Log Analysis

Note: More on this in the "Tuning Search" talk, Wednesday at 1:30 pm

Conclusion - Enterprise Search

Slides are at: SearchTools.com/slides/ess06/enterprise-search-101.html