Designed for source-level modification and customization.
ConfigDig: a template-based HTML front end for easy search administration from any browser. This allows remote configuration by search admins who are not expert at Unix command-line interfaces.
Handles multiple sites and over 100,000 pages.
Index spider is quite robust and handles error conditions gracefully.
Metadata indexing is configurable, easy to add Dublin Core DC tags.
The Open Road: Using ht://DigUnixReview: April 2002 by
Joe "Zonker" Brockmeier Part 1 is a short but helpful discussion of how the indexing and search work, formatting results, scheduling and configuration. Part 2 talks about tuning the search engine for speed and efficiency.
Comparing Open Source IndexersInfomotions Musings: May 29, 2001 by
Eric Lease Morgan Describes the history and features of eight open-source search engines, freeWAIS-sf (aging code and hard to install, but good for searching email and public domain etexts); Harvest (powerful gathering features for frequently-changing data stores, good with structured documents); ht://Dig (tricky to configure, no phrase searching, automatic stemming and match word highlighting); Isearch (weak documentation and support, easy to install, dated interface, Z39.50 support); MPS Information Server (zippy indexing of both text and structured data, Z39.50 support, Perl API, limited documentation); SWISH-E (simple to install engine, CGIs in Perl and PHP still beta, good for HTML pages, recognizes new META tags, sorts results by field; WebGlimpse (easy to install and configure, requires commercial version for customized output); Yaz/Zebra (mainly Z39.50, no Perl API, mainly a toolkit to index and respond to distributed client queries). Article also points out that chaotic information is less than helpful and encourages organization, structure and vocabulary control.
I love it when a plan comes togetherPalmPower Magazine: March 2001 by
David Gewirtz Rambling but cheerful description of setting up a search engine for ZATZ web sites using ht://Dig, indexing only the appropriate articles and not the alternate forms or contents pages. Some digressions into robots.txt, Linux and PHP.
Morrissey Solo - An active fan site for English solo musician Morrissey.
Functional Materials for Energy Technology - German research institution devoted to developing and optimizing materials and highly-efficient systems for thermal insulation.
Avi Rappoport of Search Tools Consulting can help you evaluate your search engine, whether it's on a site, portal, intranet, or Enterprise.
Please contact SearchTools for more information.