Price: Free (open source) under the GNU License Platform: perl/C on Sun Solaris, Dec Alpha, BSD, Linux, Mac OS X, Open
VMS, AIX, and Windows 95/NT/2000/XP with Unix emulators.
Features
Version 2.4.4 (December 2006)
Added Inverse Document Frequency calculation to relevance ranking algorithm.
New "near" query operator
New "?" single character wildcard operator
Updated API
Fixes many bugs and incompatibilities
Indexes local files and web sites using a robot spider.
Follows
Robots Exclusion Rules (including META tags).
Indexes and searches data in tags, including Dublin Core meta tags
and XML nested fields.
Can report structural errors in your XML and HTML documents.
Uses external converters to index binary files including PDF, Microsoft
Word, Excel, MP3 and compressed files.
Supports Basic Authentication (user name and password)
Indexes can be moved to other machines, even other platforms.
Can use an external program to supply documents to the indexer,
including database connectors.
Fuzzy matching, including truncation, stemming, soundex, metaphone,
and double-metaphone indexing.
Document "properties" (some subset of the source document, usually
defined as a META or XML elements) may be stored in the index and returned
with search results.
Search handles simple keyword queries, Boolean And, Or, Not, and
parentheses, phrase searching and wildcard searching.
Uses Regular Expressions to select documents for indexing or exclusion,
and can limit searches to parts or all of your web site.
Special case indexing for "buzzwords" - complex terms including
punctuation, such as C++ and SWISH-E.
Searches, merges, and ranks results from multiple indexes.
Limit searches to parts of documents such as certain HTML tags
(META, TITLE, comments, etc.) or to XML elements.
Results can be sorted by relevance, date, size, and other fields
in ascending or descending order.
Search results show match terms in context
Includes example search script with context summaries and search
term and phrase highlighting. Search results layout can be edited directly
or via the Perl HTML::Template.
Indexing
Arbitrary Data with SWISH-EUSENIX 2004 Annual Technical Conference:
June 3 2004 by Josh Rabinowitz This paper discusses the structure, features, and usage of swish-e,
with mentions of possible directions for further development and interesting
related work. We also compare swish-e to MySQL's full-text search feature
in terms of features and speed, and discuss two real-world swish-e applications,
Sman and Swished.
Comparing
Open Source IndexersInfomotions Musings; May 29, 2001 by Eric Lease
Morgan
Describes the history and features of eight open-source search engines, freeWAIS-sf
(aging code and hard to install, but good for searching email and public domain
etexts); Harvest (powerful gathering features for
frequently-changing data stores, good with structured documents); ht://Dig (tricky to configure, no phrase searching, automatic
stemming and match word highlighting); Isearch
(weak documentation and support, easy to install, dated interface, Z39.50
support); MPS Information Server (zippy indexing of both text and structured
data, Z39.50 support, Perl API, limited documentation); SWISH-E (simple
to install engine, CGIs in Perl and PHP still beta, good for HTML pages, recognizes
new META tags, sorts results by field; WebGlimpse
(easy to install and configure, requires commercial version for customized
output); Yaz/Zebra (mainly Z39.50, no Perl API, mainly
a toolkit to index and respond to distributed client queries). Article also
points out that chaotic information is less than helpful and encourages organization,
structure and vocabulary control.
Examples
Biblio - E-commerce site for
used, rare, out-of-print, hard-to-find books.