Tools List
About Us


Search Tools News 2005


December 16, 2005

Annotated Bibliography on Information Retrieval

Recommended Reading for IR Research Students [PDF, 1.5 MB] SIGIR Forum, December 2005 by Alistair Moffat, Justin Zobel and David Hawking.
Extensive annotated bibliography of the most important works in Information Retrieval since 1997. Covers topics including TREC results, scaling issues, index compression, multilingual retrieval, multimedia retrieval, statistical, vector and probabilistic approaches, evaluation and testing, and much more.

Oracle acquires TripleHop MatchPoint

Triplehop MatchPoint search software was acquired by Oracle last June. Support continues for existing customers; a migration path may available as Oracle integrates Triplehop's technology into its own enterprise search efforts.

Analysis: looks like a good match

findinsite Report updated

findinsite, (formerly known as Spy-Server) runs as a Java servlet, applet, ASP.NET or remote hosted service. It offers a good set of the current search engine features, including indexing common document formats, tools for controlling indexing, stemming and synonyms, fourteen languages and cached copies of documents with match terms highlighted.

November 30, 2005

Google Mini Search Appliance requires security patch

As reported by EWeek, the security sites, Metasploit and Secunia have found security holes in the Google Mini Search Appliance that could lead to abuse by hackers. All users of the appliance should make sure they apply the Google-supplied patch as soon as possible. (See also the slightly out-dated but long review of the Google Search Appliance, and a Product Report for general information).

November 23, 2005

Slides from presentations to classes and conferences

Avi Rappoport of SearchTools has been doing some speaking at conferences and classes. If you are interested in having Avi speak on one of these topics to a meeting or corporation, please contact her.

Upcoming Search-Related Conferences

The most important search conferences for 2006. I won't be at all of them, but think they're all worth going to.

September 16, 2005

SLI Learning Search - Remote Search Service report updated

S.L.I. Learning Search, the remote service which powers search on NBC.com and busy ecommerce sites, has an effective "learning" system to adjust search results weights based on previous search user click statistics. The search indexing robot that can follow links in JavaScript and VBscript, and handle sites with frames and cookies, HML, text, PDF, Word, Excel and Flash documents. Works with its own index, other search engines, and databases, can metasearch other sources and combine results, show search and spelling suggestions, shows prices and pictures in results, or sends raw XML. Scales to millions of searches per day, multiple data centers. Provides extensive interactive search analytics and reports. comment on SLI

Spiderline - Remote Site Search Service report updated

Powerful search service works remotely from the company servers, indexes via a robot spider. It can index HTML, text, PDF and MS Word files, and can handle URLs with session IDs, cookies, password-protected pages and HTTPS. It includes editable synonym lists and custom weighting of meaningful words, but no search suggestion tools. Searches include Boolean support, soundex and stemming, and can be done within a zone based on URL paths. Results pages are highly configurable, and also available in XML for programmatic flexibility. The search reports include top queries, clickthrough tracking and referrers, along with raw logs. comment on Spiderline

September 9, 2005

Ultraseek - Enterprise Search Engine report updated

Since its purchase by Verity in late 2002, Ultraseek has been significantly improved. Not only has the company developed a major upgrade, it has added valuable features in interim point releases. New pricing as of June 2005, including free one-year trials, perpetual licenses, and incorporation of previously separate modules make this an even more competitive product. New features in versions 5.x include SOAP and web services support, continuous improvement in Acrobat PDF handling (including Japanese), automated tools for generating page titles, excluding navigation text from indexes, hit-level authentication, layout manager for designing results page interface and additional Search Reports, including clickthrough tracking, as well as raw logs. comment on Ultraseek

September 8, 2005

SiteSearch - Windows/Javascript Search Engine report updated

Windows program generates an index file, search engine is JavaScript that reads the index. Can run on hosted servers with no system access, CD-ROMs, DVDs. comment on this tool

SiteSurfer -Java Search Applet report updated

Java indexing and search applet provides a GUI for search administration, indexes Word, WordPerfect, customizable applet for end-user searching, can search fields such as author, description. Works on CDs and DVDs. comment on this tool

SpyServer - Java Search Servlet report updated

Java servlet provides local server and robot crawler indexing, scheduling, simple HTML password access, European and Asian languages and character sets, templating system for results pages. Can be run from CD-ROMs, DVDs. comment on this tool

July 25, 2005

Robot Indexing Tests Updated

I added a test for following JavaScript links to popup menus. I also fixed up the server-side redirect test. I set the password password test: you can now add the user name "robot" and password "allow" into your robot spider and it should be able to access the page. On the other hand, you should never be able to get to this page at all. If you can, please let me know! comment on robot indexing tests

Obsolete and Discontinued Search Engines

The following search engines seem to be dead or obsolete:

Please leave a comment or contact SearchTools if you know anything about their status.

July 22, 2005

Blossom report added

Blossom search is a hosted (remote) search service, which indexes one or more web servers using a spider, and stores the results on it's own servers. When a user types a query, the form goes to the Blossom service, which does the matching and relevance ranking, and returns the results with links to the original pages on the original servers. It has modern search features such as stemming, spelling suggestions, match terms highlighted in context, and proximity-based relevance. Comment on Blossom

ZyIndex report updated

ZyIndex is not quite a search engine -- it's a research tool for complete recall, appropriate in situations where any missing data could be catastrophic. It's part of Zylabs Content, Records, and Knowledge Management systems, specializing in compliance, Legal Firms, Intelligence and Law Enforcement, Financial back office and related fields. Comment on ZyIndex

July 19, 2005

SWISH-++ report updated

Open source search engine written in C++ by Paul Lucas, based on the old swish search engine. Some of the newer features include options to exclude indexing of document sections such as headers and footers, handling ID3 tags of MP3 files, extensible indexing and filtering architecture and stemming options. Comment on SWISH++

July 18, 2005

Swish-e report updated

This stalwart open-source engine continues to be active, with improvements in incremental indexing, Unicode support, improvements in config files and indexing of very long files. Please note that the official orthography is finally set: "Swish-e". Comment on Swish-e

Arexera report updated

Formerly known as TEC-IMS, this search engine has European language detection, scalable architecture, document topic analysis, and indexes hundreds of file formats. Comment on Arexera

June 24, 2005

t.find (Eidetica) report updated

This remote search service is part of a suite that includes filtering and text mining, uses an intelligent spider to ignore navigation and copyright text, works with existing metadata and taxonomies. Combines known item and subject searching. Based in Amsterdam. comment on this

May 11, 2005

Indexing and Date Problems: Search Tools Report

When servers report incorrect page modification dates, it wastes indexer time, server cycles, bandwidth and everything else. This analysis describes several common kinds of date errors, and their implications, as well as some approaches for solving these problems. comment on this

Search Indexing Date Test Suite

A set of pages with known date errors and metadata overrides, for testing search engines' capacity to handle date problems. comment on this

May 4, 2005

Search Mailing Lists and Usenet Discussion Links Updated

Links for discussion groups in general, and specific products: DTSearch, FAST, Google Appliances, ht://Dig, SWISH-E and Webinator mailing lists.

April 22, 2005

Enterprise Search Summit Coming Soon

The US Enterprise Search Summit is coming up, May 17 - 18 in New York City. Speakers include Lou Rosenfeld, Joseph Busch, Ron Daniel, Tom Reamy and Peter Morville, and it should be a fascinating meeting. Search Tools' Avi Rappoport will be offering an enterprise search workshop on May 16, speaking about search reports, metrics and analytics on the 18th, and moderating several panels on search topics. I hope to see many of you there!

The European Enterprise Search Summit has been cancelled.

Search Product Reports Updated

AJ is very productive. She's updated the Open Source Xapian code library page with impressive examples, the SimpleSearch page to point at the secure NMS Perl code, the WizDoc page new articles and an example, the Windex page with price and platforms, and the WebSONAR page with features and an example.

Search Products Marked Obsolete or Discontinued

WideSource Peer Search. WebSTAR Search, Web Server 4D Site-Search. There's been no development on Websearch Perl Script or Webrom, but they're still sold and supported.

April 15, 2005

Coveo Enterprise Search - New SearchTools Report

Coveo, formerly Copernican Enterprise Search, is designed for intranets and departmental servers. It uses an HTTP robot crawler and indexes most office productivity documents as well as HTML and XML. Extensions handle SharePoint, Lotus WebAccess and offer APIs for custom document converters. It runs on Windows Servers, and integrates strongly with IIS document level security and Windows file access permissions. Includes advanced summarization and concept extraction, parametric search, query corrections and suggestions, configurable user preferences, and extensive reports.

April 4, 2005

Search Product Reports Updated

AJ Summers is updating search product reports very quickly. She's done minor updates and link confirmations to the reports on Zoom, and Zebra, with more to come soon. Because we are both slightly odd, she's starting from the end of the alphabet. comment on Zoom, Zebra and general product report issues

YourAmigo Adds Spider Linker

The new YourAmigo SpiderLinker tool makes database entries and dynamically generated content available to both internal search engine indexers and external search indexers such as Google and Yahoo. comment on YourAmigo

XML Query Engine

XML Query Engine (aka XQEngine), written in Java, is now free under the GPL license, and has a continuing SourceForce project. comment on XML Query Engine

March 29, 2005

Atomz Acquired by WebSideStory

Atomz is being bought by site analytics vendor WebSideStory. They're combining the services to create a complete set of web publishing and marketing services, "Active Marketing Suite" . The remote search service will b renamed "WebSideStory Search" and be available stand-alone. The company will likely offer improved search analytics. comment on Atomz

Search Suggestions Article Updated

New information, articles and discussion of using human judgment to supplement search results for popular queries. I formerly called this "manual recommendations" but that was too awkward. comment on search suggestions


SearchTools News, like a Blog. RSS feed [Valid RSS] (validates as RSS)

Site change details

For earlier news, see the 2004, 2003, 2002, 2001, 2000, 1999 and 1998 news archive pages

Last Update: 2005-12-16

Tools List
About Us

Copyright © 2005 
Search Tools Consulting