Presentation by Search Tools Consulting's Avi Rappoport


Enterprise Search Engines: Critical Success Factors

 

Avi Rappoport

Search Tools Consulting


KMWorld/Intranets, October 30, 2006

 

 

About Avi Rappoport

Defining “Enterprise Search”

Search engines (SEs) may seem like a black box: queries go in, answers come out. But they're just software, and the more you know, the more you can tune your search engine to solve your users' real information needs. This session describes the various aspects of search: index structure, robot spiders and other indexers, query parsing, retrieval, relevance ranking and designing usable search interfaces, describing common problems and best practices. It covers the critical success factors (CSFs) for successful implementations of enterprise SEs and suggestions for choosing a search engine, or evaluating an existing search engine.

Search is Like an Iceberg

 

 

 

 

 

 

 

Similarities to Webwide Search

Differences from Web Search

Search and Information Architecture

Search and Taxonomy Categorization

Search and Knowledge Management

Simple Search vs. Research

Text Search Engines vs. Databases

For more detailed information, see www.searchtools.com/info/database-search.html

end of intro, slide 10, next is Elements of a Search Engine

Elements of a Search Engine

 

Defining Content To Be Searched

Note: Setting standard rules about what will and will not be indexed saves time and increases consistency. Notes in the Content Inventory explain exceptions and special cases.

This is a Critical Success Factor! Come back to the Content Inventory and processing issues throughout indexing


Finding Content

How Robot Spiders Work

search spiders follow links when they can

CSV

 

 

 

 

 

Common Problems With Robots

Infinite Links and Black Holes

CSF - valuable data may be hidden back there. GO robots.txt example

Other Data Sources

Access Control and Authentication

Note: Many online newspapers and magazines provide search results for articles as teasers, to encourage purchases or subscriptions.

Note: For pages protected during transit by encryption (SSL), the search engine indexer can use an SSL client for access. The server then needs to be protected as much as the original server, and to serve results pages encrypted to avoid unauthorized access in transit . Again, work with security team on this policy.

Indexing - Extracting Words in Context

concept of Rich Index - a CSF

stemming & stopwords - see below

Inverted Index Diagram

 

Indexing Simple File Formats

Note: most HTML and XML metadata is about the document, rather than words in the text.

Indexing Complex File Formats

Indexing Database Records

Indexing Multimedia

Indexing - Languages/Charsets

Indexing - Dealing With Dates

Indexing - Yes, Even Stopwords

Indexing - Stem Later

Index Document Information

The Dublin Core working group agreed on 15 tags for tracking and cataloging web pages and creating metadata. These are: Title, Creator, Subject, Description, Publisher, Contributor, Date, Type, Format, Identifier (URL), Source, Language, Relation, Coverage and Rights (copyright information). For more information, see http://dublincore.org/

Complex Index Format

shows word and document stored data

Break

The Search Process

search process

Query Processing

More Query Processing

Search Retrieval, Recall & Precision

How Retrieval Works

Defining Relevance Ranking

Highly recommended for relevance and other issues: Modern Information Retrieval Ricardo Baeza-Yates & Berthier Ribero-Neto, Addison-Wesley Longman, May 1999, ISBN 020139829x, $55

More on Relevance

Marcia Bate Berrypicking

Break

end of first part

 

Visible Search: Forms and Results Pages

Visible Search: Search Forms

example search in nav bar

Special Search Interfaces

 

 

 

 

 

 

 

Compare with simple text search results:

text search example

 

 

Advanced Search is A Tricky Problem

Some Systems Need Advanced Search

ebay exa,[;eEbay users have specific requirements and are willing to use the advanced search form.

Likewise, scientific, technical and business intelligence researchers may want to use advanced search for specific topics.

 

 

 

 

 

 

 

 

Search Results Page: Overview

CSV: don't confuse the users!

 

 

 

 

 

Visualization in Search Results

Search Results Header

Results Header Example

dartmouth

Search Suggestions

aka Best Bets, QuickLinks, KeyMatch, Recommendations

Search Suggestions Example

nordstom armani example

Basic Result Items Content

Simple Results Items Example

new york magazine example

Notes: In this example from New York Magazine, the results items include the titles as clickable links, author name, page description, section (sometimes), date. For articles where the matches are further down in the text, users may not understand why they got the match. That's why the match terms in context is so powerful.

More Information in Results

Consumer Reports Results Example

consumerreports.org example

From consumerrports.org, results items include: Titles (sometimes duplicates), dates (sometimes) in the titles, text surrounding bolded match terms, category. Note the Free flag: the other results are shown as encouragement to subscribe.

Commerce and Catalog Results

Online Store Results Example

macys result for green jacket

from macys.com

Searching Multimedia Content

Multimedia Results Example

pbs video search reults

The PBS.org NewsHour search results show metadata, a frame from the video, and a dropdown menu containing that section of the transcript. The search engine is from OnStreamMedia.com.

Faceted Metadata Search and Browse

Faceted Metadata: Commerce

Nordstrom.com - Note that the facets on the left side include category and color; further down are size, price, and brand. Each of them has a preview number, so it's clear how many items will be there when the user clicks.

Faceted Metadata: Library Catalog

ncsu library results

The North Carolina State University Library online catalog: http://www.lib.ncsu.edu/catalog/

Handling Empty Queries

Why Some Queries Have No Matches

What To Do for No-Matches Queries

Bad Example of a No-Matches Page

msha.gov

It has the site navigation, but no search field in the center, no hints or tips, no spelling suggestion...

Good Example of a No-Matches Page

verisign no-matches

Site navigation, search field, most important topic links on left, suggestions for domain queries on left, search tips and feedback form.

Break

Choosing a Search Engine

  1. Research the information needs
  2. Analyze content on web servers and databases
  3. Define scale, platform, system compatibility requirements
  4. Compare several likely prospects
    • Indexing tools
    • Query processing: spellchecker, synonyms, suggestions
    • Retrieval and relevance
    • Administration tools
    • Special functionality
    • Price
  5. Implement and test test test!
    • Most search engine developers don't test user experience at all

Note: Buy, don't build, unless you have a truly unique need.

For information needs analysis, see slides in first section

Content Survey

Types of Search Engines

  • Notes:
    • Search appliances can be awkward to administer when hosted remotely
    • Remote search services fall in the Software as a Service (SaaS) category.
    • Open-source search engines usually concentrate on robot indexing and relevance ranking features

Evaluating Search Engines - I

 

Evaluating Search Engines - II

 

Think About Scaling Search

Notes about evaluating scale: The IRS on April 14 gets over 100 queries per second, Google in 1999 got 65 queries per second. Many intranets get fewer than 10 queries per minute.

Search Engine Maintenance

Metrics for Search Engines

Search Log Analysis

 

 

 

 

 

 

Conclusion - Enterprise Search Success Factors


Home
Guide
Tools Listing
News
Search
About Us
SearchTools.com - Copyright © 2006-2007 Search Tools Consulting
This work is provided under a Creative Commons Sampling Plus 1.0 License.