Enterprise Search 101

 

Avi Rappoport

Search Tools Consulting

 

About Avi Rappoport

Defining Enterprise Search

 

 

 

 

 

 

 

 

 


Search as an Iceberg

 

 

 

 

 

Similarities to Webwide Search

Differences from Web Search

 

 

 

Search and Information Architecture

Search and Taxonomy Categorization

Search vs. Knowledge Management

 

 

 

 

 

 

 

Simple Search vs. Research

Text Search Engines vs. Databases

Elements of an Enterprise Search Engine

 

 

 

 

 

 

 

Choosing Content To Index

Security and Access Control

Many online newspapers and magazines provide search results for articles which are available for purchase or accessible by paid subscription. These include The New York Times and Consumer Reports.

Finding Content

 

 

 

 

 

 

 

How Robot Spiders Work

example of spider

Diagram by James Ghaphery, VCU

 

 

 

 

 

 

 

Common Problems With Robots

 

 

 

 

 

 

 

Other Data Sources

File Format Issues

Indexing - Getting Data


 

 

 

Index & Search Multiple Languages

Index Structures and Content

Inverted Index Diagram

Indexes: Document Information

The Dublin Core working group agreed on 15 tags for tracking and cataloging web pages and creating metadata. These are: Title, Creator, Subject, Description, Publisher, Contributor, Date, Type, Format, Identifier (URL), Source, Language, Relation, Coverage and Rights (copyright information). For more information, see http://dublincore.org/

Complex Index Format

 

 

 

 

Indexing Multimedia

Indexing: Dates

 

Indexing: Stopwords

 

Indexing: Stemming

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The Search Process

search process

Query Processing

More Query Processing

Search Retrieval, Recall & Precision

How Retrieval Works

Defining Relevance Ranking

Highly recommended for Relevance and other issues: Modern Information Retrieval Ricardo Baeza-Yates & Berthier Ribero-Neto, Addison-Wesley Longman, May 1999, ISBN 020139829x, $55

 

More on Relevance

 

 

Visible Search: Search Forms

example search in nav bar

 

 

 

 

 

 

 

 

 

 

Special Search Interfaces

 

 

 

 

 

 

 

Compare with simple text search results:

text search example

 

 

 

 

 

Advanced Search

Some Advanced Search Makes Sense

ebay exa,[;e

 

 

 

 

 

 

 

 

 

 

 

Search Results Page: Overview

 

 

 

 

 

 

Search Results: Visualization

 

 

 

 

 

 

 

 

Search Results Header

Results Header Example - Good

dartmouth

 

 

 

 

 

 

 

 

 

 

Search Suggestions

Search Suggestions Example

nordstom armani example

 

 

 

 

 

 

 

Result Items Content

Results Items Example

From New York Magazine

new york magazine example

Notes: In this example, the results items include the titles as clickable links, author name, page description, section (sometimes), date.

 

 

 

 

 

 

 

 

 

More Information in Results

 

Consumer Reports Results Example

consumerreports.org example

From consumerrports.org, results items include: Titles (sometimes duplicates), dates (sometimes) in the titles, text surrounding bolded match terms, category. Note the Free flag: the other results are shown as encouragement to subscribe.

 

 

 

 

 

 

Commerce and Catalog Results

Online Store Results Example

macys result for green jacket

from macys.com

 

 

 

 

 

 

 

 

Showing Multimedia Results

Multimedia Results Example

pbs video search reults

The PBS.org NewsHour search results show metadata, a frame from the video, and a dropdown menu containing that section of the transcript. The search engine is from OnStreamMedia.com.

 

 

 

 

 

 

 

 

 

 

Faceted Metadata Search and Browse

Faceted Metadata: Commerce

Nordstrom.com - Note that the facets on the left side include category and color; further down are size, price, and brand. Each of them has a preview number, so it's clear how many items will be there when the user clicks.

 

 

 

 

Faceted Metadata: Library Catalog

ncsu library results

The North Carolina State University Library online catalog: http://www.lib.ncsu.edu/catalog/

 

 

 

 

 

 

 

 

 

 

 

 

Empty Queries

Search Failure: No Hits

Bad Example of No-Matches Page

msha.gov

It has the site navigation, but no search field in the center, no hints or tips, no spelling suggestion...

 
 

Addressing Search Failure

Good example of No-Matches Page

verisign no-matches

 

 

 

 

 

 

 

Choosing a Search Engine

  1. Research the information needs
  2. Analyze content on web servers and databases
  3. Define scale, platform, system compatibility requirements
  4. Compare several likely prospects
    • Indexing tools
    • Retrieval and relevance
    • User-oriented features: spellchecker, synonyms, suggestions
    • Administration tools
    • Special functionality
    • Price

Note: Buy, don't build, unless you have a unique need.

Information Needs Analysis

Content Survey

 

 

 

Types of Search Engines

  • Notes:
    • Remote search services fall in the Software as a Service (SaaS) category.
    • Open-source search engines usually concentrate on robot indexing and relevance ranking features

Testing & Evaluating Search Engines

Note on matching issues:

  • Many matches - e.g. name of product or company
  • Few matches - technical terms
  • No matches - force with nonsense such as "zxcvbnm"

 

 

 

 

 

Scaling Search

Search Engine Maintenance

Metrics for Search Engines

 

 

 

 

 

 

 

 

 
 

Search Log Analysis

Note: More on this in the "Tuning Search" talk, Wednesday at 1:30 pm

Conclusion - Enterprise Search

Slides are at: SearchTools.com/slides/ess06/enterprise-search-101.html