Platform: Java in Linux (RedHat 4.3, SUSE 10), Java in
Windows XP (SP2, 2003 Server SP1) Price: Free (up to 500,000 documents)
, the very different OmniFind Enterprise Edition starts around $20,000
Features
Simple installation and configuration using the browser admin
Can index documents in the file system, including remote file system crawling
Web robot indexing crawler handles complex
sites, with wildcard include and exclude options, can handle IBM Websphere and blog formats.
Proxy server support for the crawler.
Crawler does continuous checking, adapting to document change frequency
No current support for meta robots tags.
Access control via Basic Authentication (HTTP user name and password), form-based authentication
Indexes over 400 file types, including text, HTML, rtf, Microsoft Word, Excel, PowerPoint, WordPerfect, etc. (uses Stellent Outside In file converters), up to 50 MB per document
Extracts HTML title, keyword and description tags, title from documents with properties
Indexes and searches Arabic, Czech, Danish, German, Greek, English, Spanish, Finnish, French, Hebrew, Italian, Japanese, Korean, Dutch, Norwegian, Polish, Portuguese, Russian, Swedish, Simplified Chinese, and Traditional Chinese.
Document language detection for better tokenization and stemming
Automated duplicate detection and deletion.
Index stores stemmed form of words.
Default search finds pages with all match terms, query processing supports Internet Query Operators (-, ""), Boolean
AND, OR, NOT and parentheses, and metadata field tags for URL, doctype, title, keyword and description.
Option to search the Internet via Yahoo!
Query language defined by browser settings or language parameter
Spellchecker and synonym
functions (which can be imported from XML)
Search suggestions (called "Featured Links" and "Shortcuts") editing via the admin interface or import XML file
Search results UI customization via the admin interface, as minimal content, ATOM feed mode, HTML with XSLT, or as XML for local processing.
Can integrate with PHP, ASP, REST (HTTP with parameters), etc.
Default relevance uses keyword match (TF:IDF), date, path depth and links analysis.
Relevance weighting can be disabled for any element but keyword match
Documents are cached in original format, converted to HTML on demand for viewing.
Admin interface for configuring and checking indexes.
Reports include general metrics, crawled URLs, response times, popular queries and no-matches.
Active development and online user forums.
Enterprise support available from IBM
Articles & Reviews
PHP Sample and "PHP API" (Reusable Functions)
:
March 6, 2007
, by
Sean Johnson Using PHP 5 to and oye.php to call the OmniFind search engine using REST, with screenshots and a listing of the functions available.
Adding IBM’s OmniFind Yahoo! Edition to Your Web Site
:
December 13, 2006 , by Todd Leyba Describes several ways to integrate OmniFind, specifically with Blogger sites. Includes details on how to use REST to send the query, get the results in XML
and format with XSLT.
IBM and Yahoo give the gift of free search
:
December 13, 2006,
by
Mike Heck Quick review praises the easy installation and administration, simple customization of UI, featured links, synonyms and ranking adjustments. The reviewer likes the speed of indexing and the familiar interface, with options to switch to Yahoo's web, audio, video, news and local search.
Examples
UFO Crawler - topics include UFO Sightings, time travel, conspiracy theories and anomalies.
Avi Rappoport of Search Tools Consulting can help you evaluate your search engine, whether it's on a site, portal, intranet, or Enerprise.
Please contact SearchTools for more information.