Best Practices and Future Visions for Search User Interfaces
A workshop at the Computer-Human Interface conference will provide a forum for presenting research on user interfaces for search engines and discussing current issues in search interfaces. The conference is in Fort Lauderdale, Florida, April 5-10, 2003, and proposals are due January 17, 2003.
Search Analysis from Gilbane Reports
Two reports look at the use and limitations of search, analyze the enterprise search business. Summarizes techniques and describes some of the leading products. Refers to the "principal of least surprise" to recommend predictable software over products which may be spectacular but unreliable. Recommends that purchasers test search engines with their real data to understand the strengths and weaknesses of the choices.
Looking, Finding, Searching... How Users Do It
Presentation by Whitney Quesenbery based on a study of users and health information sites found that a surprisingly large number of the users were adamant about needing to use search to find information. Study discovered that they had a hard time putting questions into words, worrying about language and technical terms, even on second tries. They tended towards broad short queries with some complete phrases, had trouble with spelling and typing, were confused about operators, and had a hard time even finding the search field. When results pages have too much navigation at the top, search box, multiple headers, links that don't look like links, users can't even find the results listing. When confronted with a result list, they read it as a page, assume the most important items are first, rely on titles as "headlines" and look for additional information. No one could tell the researchers what the stars meant (more relevant). Very few ever tried to refine a search when a results list did not provide relevant articles, these users gave up on the whole site. Quesenbery recommends testing search and results page designs, blending search and browse, and leveraging metadata such as page title, description, and general document category. She finds faceted metadata search a good solution for many sites, while visualization techniques can answer high-level questions at a glance. (STC Telephone Seminar, November 19 2002 by Whitney Quesenbery)
Infonortics Search Engine Meeting, Boston, Massachusetts, April 7-8, 2003
Presentations on the state of search engines, on the Web and in enterprise intranets. Also covers information filtering, categorization, metadata extraction and related topics.
dtSearch Web: SearchTools Report Update
Version 6.1 of this Windows-based site search offers improvements to the robot spider, tags for excluding text from indexing, Unicode support, standard search form query options, display of web pages and office documents in browser with match words marked, and customizable logging of search requests. Can integrate with CD-ROM or distributed search versions, now supports .NET.
SearchKey Pro: SearchTools Report Update
Java search engine now indexes Excel and PowerPoint documents, performs phrase and paragraph searches, can distribute and collate searches among multiple indexes. HTML Pager tool provides full customization of results pages, and the system will display results documents with the match words highlighted. Search engine is also available for CD-ROM and as an API.
Engenium Semetric: New SearchTools Report
Concept-matching engine uses advanced mathematical algorithms to create associations for words within large sets of documents, so it can increase recall and find matches even when the search term does not appear. It
has been deployed for resume searching, patents, portals and government knowledge management systems.
Universal Knowledge Processor: New SearchTools Report
Faceted metadata browse and search engine, based on research by an Italian professor. Runs on Windows, scales up to millions of documents, includes a rules-based classification engine and an optional search engine.
SiteFerret: New SearchTools Report
Low-cost Windows search engine uses a browser admin control panel to include and exclude files from indexing, define search zones, configure results pages. Query interface has radio buttons for All Words, Any Word, Phrase, and a menu for zones; results page can show meta descriptions or match words in context, search report lists queries by popularity.
linkSearch: New SearchTools Report
Java J2SE search engine from a Swedish company, Datalink, scales to tens of thousands of documents. Indexes both local and remote sites, Plus version can handle office documents. Very configurable results pages, with templates and ASP/JSP examples.
Intelliseek: SearchTools Report Update
Enterprise metasearch engine performs a federated search on its own indexes of local and public web information, local repositories including Lotus Notes and CMS systems, and subscription feeds such as Factiva. Now supports Arabic, Chinese, English, Indonesian, Japanese, Korean, Persian, Russian and most European languages. It recently received funding from the US government venture capital group, acquired some technology from WhizBang labs, and started an Applied Research Center for information retrieval technologies.
IMP Database Search: New SearchTools Report
Replaces Microsoft SQL Server 2000's full-text search (Oracle and SQL 7 to come) with a fast text search engine. Includes synonyms and spellchecking, fast indexing and relevance priority weighting.
Nav4 Search Engine: New SearchTools Report
Standalone search engine, "patch kit" for installed search engines, or SDK, this software offers automated recommendations and related documents for search results. It uses Natural Language Processing to identify document themes and concepts. Many of the developers at the company Think Tank 23 used to work at FizzyLab, a startup that didn't make it. The engine powers the WayPath web log and news site.
TYPENGO N300 Search: New SearchTools Report
Search engine concentrates on increasing recall using phonetics, concept matching, synonyms and custom dictionaries. Local server or remote hosting, many online store features, analytics engine for search log analysis.
Unstructured Data Management Report from the451
This report describes the huge amounts of unstructured data in enterprise computer networks, wasted time re-creating this information, and the lack of tools equivalent to data mining, business intelligence and OLAP. Identifies four application sectors: content, document and knowledge management; search and retrieval; categorization, taxonomy and data visualization; XML databases. The analysts evaluate these sectors from a business perspective, defining strengths, trends, pricing and leading vendors. Point out that the Web expanded the size of the search market but did not sustain it, while the categorization market is volatile, with many small and recently-acquired companies. Analysts believe that the leading relational database vendors (IBM, Oracle and Microsoft) may be able to lead the unstructured data market as well. Describes Verity K2 as an integrated search and taxonomy system, preferred over Autonomy, which is a "Rolls Royce" company in knowledge management and collaboration, Ultraseek (Inktomi Enterprise Search), a lower-value search engine even with the Quiver taxonomy engine. In Categorization and Visualization, they feel that InXight is the best among a field that includes Antarctica, Applied Semantics, ClearForest, IBM Lotus Discovery, Mohomine, Entrieva (Semio), Stratify and The Brain. (Price unknown, probably expensive).
On-the-Job Research: How Usable are Corporate Research Intranets?
Report on usability testing of seven representative research intranets - those providing textual content for all employees, rather than services such ordering, billing or collaboration or a departmental site. Finds that most intranets are underutilized because they are badly designed and organized, and difficult to use. Common information needs include employee telephone numbers, offices and email addresses, current company news, media coverage of their company and competitive intelligence research. However, only 44% of participants, managers, administrative assistants, and researchers alike, were able to complete the test research tasks in these areas. Search engines had significant problems, such as requiring complex query operators and failing to provide context or filters for large search results. Study includes many details on testing and intranet content, design, information architecture and search issues. Alison J. Head & Associates, April 2002 (non-SLA members: $135 print, $185 PDF).
Intranet Design Annual 2002: The Ten Best Intranets of the Year
Analysis of ten high-quality corporate intranets, with significant emphasis on both information search and employee locator search, along with home page quality, simplicity of overall look and navigation, consistency, and text. Includes heuristic usability analysis, screenshots and case studies of the intranet design process. Several notes cover user response to search, such as unhappiness with search defaults for the head office directory rather than the whole site. One site reported that an improved search engine indexing the whole intranet, was highly popular, with users calling to say how well it was working. NNGroup Report, September 2002 by Kara Pernice Coyne, Candice Goodwin and Jakob Nielsen, $45
E-Commerce Search Engines Improve Sales
Reports from an electronic retail conference indicate that improving the search engine helps online stores increase search use and average order size significantly. Favored search engines for web stores include EasyAsk, iPhrase and Endeca.
Blue Angel MetaStar: New SearchTools Report
MetaStar is a distributed and local search engine written as a Java Servlet. It supports the Z39.50 distributed search protocol, mainly used by library catalogs. The Harvester module can crawl web sites and gather pages for local indexing.
Discontinued and Obsolete Search Engines
The SearchEngine.com site search service from the UK no longer works, and a number of Perl scripts have been removed or their web pages are inaccessible: FluffySearch, ODARS4Search, Site Search 2, Super Site Search, Xavatoria Indexed Search.
Recommind Search Engine: SearchTools Report Update
Recommind MindServer claims better-than-human accuracy in categorization for browsing and search results. For example, at the public MEDLINE site, Recommind's MindServer searches a very large repository of health information and classifies the results into folders based on the content of each document.
RiSearch Engines: New SearchTools Report
Perl search engines in several flavors, including a free simple script, $40 Pro version with relevance ranking and external file format parsing, PHP, SQL back end, and more. All handle multiple languages and allow simple searches, Boolean searches and provide templates for results page customization.
FusionBot Search Service: SearchTools Report UpdateRemote search hosting service offers many new options, mainly for its higher-level services. New features include indexing PDF, Flash, Word and Excel; crawl sites using HTTPS, passwords, HTML forms, cookies and sessions; stemming and stopwords; multiple search forms and results templates; XML search results. All versions help users find their place in long pages by jumping to the anchor nearest the first instance of matched words.
Doclinx Search Engines: New SearchTools ReportSearch engine code available in three ways: as the TeraXML code library for text mining in terabytes of enterprise documents, as part of the Docsan content sales E-commerce server, and as part of the Docsan CD Publisher. Indexes over 200 file formats, European and Asian languages, up to 4 GB per hour, includes security and access control information, search can be balanced among multiple servers, supports text and Boolean queries, XQuery and XPath.
Microsearch WebSearch: New SearchTools Report
Designed to provide a searchable document library for sites. Converts PDF files to HTML for online viewing.
CGISRCH Search: New SearchTools Report
A free search engine for Windows web sites with an installation wizard.
Analysis of the Verity Purchase of Ultraseek (Inktomi Enterprise Search)
Avi Rappoport of SearchTools.com analyzes the purchase of the Inktomi enterprise search engine (formerly Ultraseek) by Verity. This Information Today article describes the business relationships and plans of the companies, features of the Inktomi search engine, and the search engine marketplace.
Inxight Search and Categorization Report Update
Inxight has released a new unified retrieval system for unstructured data, SmartDiscovery. It includes their MetaText engine and new WhizBang acquisition technology for entity extraction and metadata application, a guided retrieval engine, Categorizer taxonomy and categorization engine, and VizServer visualization tools.
Atomz Search Service Report UpdateAtomz has implemented a number of attractive new features recently, including Korean language support, indexing Microsoft Word, Excel and PowerPoint, handling forms and cookie-based authentication for indexing, and returning XML search results for local control of layout. In response to customer requests, the default search now finds only pages matching all words in the query, rather than finding any word. The new approach improves precision at the expense of recall in retrieval.
Plumtree Incorporates Ripfire Search
The Plumtree Portal Server version 4.5 now offers a search engine based on the Ripfire Ignite code base. Ripfire, which provided search software for several dot-coms and intranets, was purchased by Plumtree last year. It now indexes Content Server and Collaboration server documents as well as static HTML, XML and text files and database records.
Thunderstone Webinator Report Update
MondoSearch Report UpdateMondoSearch and IDC present an analysis of over 57 million search sessions, exposing common problems and suggesting solutions. The MondoSearch search engine (versions 4.4 and previous), had a security vulnerability allowing access to other files in the cgi directory. A security update is available, along with instructions for blocking access.
New listings on SearchTools.com for conferences in 2003 related to search, including HICSS, SAC, IA Summit, CHI, Infonortics, SIGIR and ASIST.
Ultraseek (Inktomi Enterprise Search) v.5 Released October 30
A significant update to Inktomi Enterprise Search (now Ultraseek). The "passage-based summaries" feature will show results items with the matched search terms marked within context extracted from the document (like the Google snippets). The indexer soupiest form-based authentication using server cookies. A query spell-checker supplements the automatic stemmer by offering suggested corrections based on the text within the site, and some automated retries for there searches with no matching results. Improvements for complex searches in multimillion-document collections, compatibility with Microsoft Exchange 2000 Public Folders and with hypermail or MHonArc mailing list archives. The browser-based administration interface has a new look, refresh during indexing, and access control improvements.
Verity Buys Inktomi Enterprise Search
In an unexpected move, Verity has announced that it is purchasing Inktomi's enterprise search software business (formerly known as "Ultraseek") for $25 million in cash. Verity, which has recently concentrated on knowledge management and social networking, says that it is doing this to "meet the needs of our enterprise customers" who can later migrate to the "advanced products portfolio". Verity says that they will continue to develop the search engine, XML toolkit and Quiver classification software.
InQuira for Search: New SearchTools Report
Enterprise search engine uses natural-language processing and other linguistic tools and rules to suggest answers to user questions. Integrated with customer self-service and call center products.
Faceted Metadata And Search: New SearchTools Report
Full-text search is a wonderful thing, but some kinds of information have extensive structure and meaningful attributes for each record. Traditional interfaces for structured data have required users to type into forms or choose from popup menus, but these are often confusing and don't provide enough feedback -- there's no way to tell if the choice is useful. A new way to search and browse using attributes, "faceted metadata", is providing dynamic and interactive access to complex information structures.
i411 Faceted Metadata Search: New SearchTools Report
Faceted metadata engine integrates search and browse by calculating relationships in real time. Designed for online catalogs and directories, publishing and other structured content stores. Indexing function handles popular file formats, ODBC, Lotus Notes and integrates with existing taxonomies. Search engine includes spellchecking and synonyms, XML queries and responses, COM and Java APIs. Scales to millions of records running on distributed clusters of inexpensive servers.
Endeca Faceted Metadata Search: New SearchTools Report
Faceted metadata search and "guided navigation" browsing engine has InFront for online catalogs, ProFind for enterprise intranets. Data Foundry transforms content from web pages, XML and databases, regularizes it, and pre-computes some of the multidimensional relationships. System has a search engine with spellchecking and synonyms, merchandising rules for adjusting search results, APIs in C/C++, COM, ASP and Perl. Designed for scaling to millions of records, distributing work across multiple servers.
SearchTools at Intranets and Internet Librarian Conferences
Avi Rappoport of SearchTools.com will be speaking on "Search Engine Essentials" at the Intranets/KMWorld 2002 conference in Santa Clara, California, October 29 at 3:15 PM. She'll also be at the Internet Librarian conference, Palm Springs, California November 6, at 1:45 PM., talking about Distributed Search, integration with CMS, and security. Please come and introduce yourself if you are at either of these conferences.
Google Search Appliance Update
The Google Search Appliance, a hardware-software combination search engine for enterprise, is now available in a 5-unit version ($230,000) which can index and search up to 3 million documents, complementing the original 1 and 8 unit versions. New search engine software features include secure search with Basic Authentication and NTLM support, a User Interface Wizard for customizing query and results pages and an option to index and include in searches an "incremental" collection for quickly-changing data.
SWISH-E 2.2 Released
A major update of this free open source search engine adds such features as external indexing module Sapporo, XML parsers, faster indexing and searching, merging and ranking results from multiple indexes, and results pages match words highlighted in context extracted from the original documents. Written in C and Perl, available for Unix, Linux, Mac OS X and Windows.
Search Tools Survey Results, July-August, 2002
We now have 1627 survey results as of July 31, 2002, covering the topics of why site managers have or have not installed search engines, correlations of the sizes of sites and the installation of search engines, frequency of updates, file formats served, languages, and number of languages used on sites.
For web administrator ratings of the search tools they've used, see the Survey Ratings page. This includes evaluations of the most popular search engines (with six or more responses), other products, and custom development.
Robotcop enforces robots.txt
The Robots.txt file is a cooperative way to request that crawlers and spiders avoid certain parts of web sites. This free server module watches for spiders which read pages disallowed in robots.txt, and blocks all further requests from that IP address. It is particularly useful for blocking email address harvesters, while still allowing legitimate search engine spiders. Be sure to double-check your robots.txt file (use one or more of the robots.txt checkers), before implementing it, and to watch your server logs carefully. The August 2002 version (0.6) works with Apache 1.3 on FreeBSD and Linux.
Convera Visual RetrievalWare SDK version 5.0
New version provides improvements in the video clip editing and fuzzy matching, modules for color, shape, texture indexing, automated shot-boundary detection, more image and video formats, additional OS support includes FreeBSD, OpenBSD and Darwin (Mac OS X).
Google Search Appliance Capacity, Prices Rise
The new price for the GB-1001 (1u rackmountable box, indexes 300,000 documents) is $28,000 -- up from $20,000; for the GB-8008 (an 8u server rack with additional load balancing features, and capacity for millions of documents) the price is now $450,000 -- up from $250,000.
Inktomi to Concentrate on Search, Buy Quiver
Inktomi has announced that it's going to direct resources to Web and enterprise search, reducing the content networking part of its business. It is also buying Quiver, which has developed a content categorization tool. Quiver has a mixed manual and automated classification workflow system, allowing editorial staff to adjust taxonomies and document categorization for best results.
After the Dot-Bomb: Getting Web Information Retrieval Right This Time
Marcia Bates, an academic expert on usable information retrieval suggests that if web entrepreneurs and VCs had known about the history of IR and library experiences, they would not have wasted investments in problematic approaches such as "push" technology. She offers seven suggestions to improve web retrieval: use faceted rather than hierarchical classification; don't try for a single "true" classification (and avoid the term 'ontology'); use subject and domain information retrieval vocabulary; remember the Bradford distribution; plan for explosive growth; provide tools for "human content processing"; learn from the history of information retrieval.
iPlanet Search Security Flaw
The Sun ONE Web Server search function (formerly iPlanet search engine / Netscape Compass) is vulnerable to a buffer overflow attack, which then gives access to the server and the ability to run code as the administrator account. Sun has released patches for iPlanet 4.1 (SP 10) and 6 (SP 3). This was reported by Next Generation Security Software.
Things you might not know about how real people search
Results from studies of people using search engines provide some clues about improving search interfaces. Marc Resnick and Rebeca Lergier of Florida International University contrast what users want and what search engines want to give them. They discuss the "Pre-Click Confidence" level of users in search results based on the user's conceptual model of the current task, the fields they want to see, and propose different approaches to results display.
Oracle Text, Ultra Search, interMedia Search: SearchTools Update
With Oracle 9i, the text search engine is renamed to Oracle Text, Ultra Search provides a web and multisource interface, and interMedia performs multimedia management and search functions. The search engine, deeply integrated with the Oracle database, has a huge number of information retrieval functions, from phonetic matching to multilingual indexing to alerting services.
Teapot Metadata Search: New SearchTools Report
Teapot is the code name for a metadata search engine that recognizes multiple facets of searchable records, and shows context in search results. A recipe site might display categories in search results for cuisine, course, main ingredient, cooking method, and so on. This approach is particularly appropriate for e-commerce search but also works for journal articles, newsfeeds and other structured content.
Atomz Search and Promote Service: Update Report
Atomz Promote service integrates site search and Web content management, allowing search admins and marketing people to site search requests and add relevant content to search results. Provides a powerful framework for search recommendations.
InteractiveTools Search Engine: New SearchTools Report
Powerful Perl script includes an admin interface, local file indexer, rules for excluding directories, optional stopwords, diacritical character conversion, search zones, Boolean and Internet Query Operators, search results in customizable templates and search reports on successful and unsuccessful queries.
t.find Remote Search from Eidetica: New SearchTools Report
New ASP search service from the Netherlands provides sophisticated spider controls and compatibility with NewsML for article archives. Search is based on neural-net-like training, recognizes user expectations and offers standard query options. Available as either a turnkey search service or a programmatic feed allowing server-side formatting.
Knowledge Access - MIT Project OxygenResearch projects at the MIT Artificial Intelligence lab present innovative approaches to personalized, collaborative and communal knowledge access. Starts from the individual and works out, rather than the traditional KM approach of storing en masse. Aspects include how the storing and representing data as objects and arcs of related objects, automated ways of acquiring and accessing data, human annotation, user pattern recognition and other interactions. Current projects include a natural-language question answering system, a personal information manager and work with the W3C's Semantic Web project.
Phantom Returns to ACTIV Software
The Phantom Search Engine for Windows and Mac OS 9 will be sold and supported by the original developer, ACTIV Software of Victoria, Canada, returning from Maxum Development. ACTIV will support all current users and offer an upgrade path, it has developed a Windows NT/2000 Service version and is working on a Mac OS X version.
ActiveSearch SiteSearch SDK - New SearchTools Report
Extends Windows-based search engines by converting natural-language questions to keywords, sending queries to SQL databases, searching for patterns, and showing mouse-over document summaries in search results.
Taxonomy of Web Search Forms
This analysis proposes a taxonomy of search forms for web sites. The categories identified are standard search forms (search field, button, perhaps a link to advanced search); surfacing: forms with search zones or filters based on site taxonomy; and qualifying: with other filters such as date, or local vs. global. Passive search interfaces are just links to search forms, rather than live fields.
As of January 2002, Ben Carlson is updating this open-source Java search engine. He has changed the GUI over to Swing from AWT, is adding features, and is working on making it more user friendly.
ZNOW Acquired by Endymion
The search engine and automatic taxonomy and classification software will continue to be developed and marketed as part of the business solutions firm.
IndexMySite Remote Search Service Now Fee-Based
The ASP search service IndexMySite is now charging fees instead of relying on advertising revenue. The first month is still free.
Search-It Remote Service Discontinued
NetMind is no longer offering the site search service.
Generating Simple URLs for Search Engine Robots
Dynamic URLs, with question marks and other punctuation, tend to put off search engine indexing robots, as well as humans looking at URLs. URL rewriting is a way to convert dynamic URLs to simple ones, but there are problems, mainly with relative links to graphics and other pages. This article offers a checklist for changing URLs on a site, links to detailed instructions for Apache and PHP, and links to rewrite filters for IIS/ASP sites.
Google Search Appliance Updated
New features include faster searching while crawling, marking some archives for crawling less often, NTLM security integration, session IDs, additional search admin interface improvements, and an increase in the capacity of the small unit.
JeevesOne for Enterprise
New version integrates content from backend databases as well as web pages.
Inktomi Enterprise Search Update - v. 4.5
Version 4.5 allows wildcard characters at the beginning, middle or end of any word. Also updates the Japanese lexer and offers a new Japanese localized search administration interface. Inktomi is also offering consulting services for search engine security and connecting to Documentum, Livelink and Interwoven.
DrawingSearcher for AutoCAD files
A new form of multimedia search, DrawingSearcher indexes text in AutoCAD files. It's built on top of dtSearch.
Delphi Report on Search and Taxonomy
The Delphi survey finds that many managers spend at least 2 hours every day searching for information, are frustrated by inadequate tools and dynamic data, and hope to address it with classification and metadata tagging systems. It goes into some detail on the need for better information tools and the search and categorization approaches available. Mainly concentrates on taxonomy and content classification tools.
New Articles on Commerce Search and Search Innovations
Articles from Internet Retailer and Forrester describe creative approaches to the problems of relevance, long lists of results, user error in searching, desire for context, product availability and even pricing.
Intelliseek Enterprise Search Server
Designed as a distributed federated search system, this engine can now access content in Oracle and SQL Server databases, bridge to Lotus Domino servers, handle incoming data feeds, and index static or dynamic documents. Acting as a broker, this search engine handles security and access control, sending queries to the most likely authorized sources and collates the results.
Inxight MetaText and VizServer: SearchTools Report
Inxight (pronounced "insight") has released MetaText, a metadata extraction tool focused on summaries, related document, people, places and things. The other new product, VizServer, offers visualization technologies for relational databases.
YourAmigo Enterprise Search: New SearchTools Report
YourAmigo, from Australia, offers an innovative approach to gathering data for indexing. Indexing agents on the source servers combine information from both web servers and file systems or databases. This allows the indexer to imitate dynamic forms and extract data that's otherwise invisible to robot spiders. Other features include metadata field tagging, flexible querying, date range searching, XML query and search interfaces, and browser admin for index control and search results formatting.
Netrics Renames and Updates Search EngineNetrics has renamed the LikeIt search engine to "Netrics Search", and added new platforms and APIs. The main focus of the product continues to be on search expansion, using fuzzy matching and other tools to locate as many items as possible that might correspond to the original query.
IXE: Ideare indeXing Engine: New SearchTools ReportAn advanced indexing and search engine toolkit, with a C++ API. Developed in Italy, this is used in several large European portals for web page, MP3, image, video, news and shopping comparison search.
Spiderline Search Service: new featuresThe Spiderline remote search ASP has added a bunch of new features, including compatibility with all recent versions of Microsoft Word, indexing with passwords, cookies, session IDs and SSL, Remote Crawl Requests (update triggering via HTTP), search zones, and search results in XML. They also show snippets of matching text in with hit highlighting in search results and have extensive indexing and search reports.
Verity K2 DeveloperVerity has released a new version of the K2 Developer toolkit, designed to allow software developers to integrate the search engine and other features with their products.
ISYS:web.asp Version ReleasedISYS/Odyssey has released a new version of their web site and intranet search engine, as an ASP (Active Server Page) module for the Microsoft IIS web server. This version has all the file format, NT security, and query features of the standalone search engine. ISYS products are commonly used on law sites, in legal offices and other intranets.
Inktomi Search Toolkit Released
New Toolkit for developers and OEMs offers one interface to both text search on unstructured and semi-structured documents such as web pages and queries on structured data such as relational databases and XML documents, returning links to documents or fragments from them. It uses XML and Unicode internally to index and cache documents, supports XQuery, wildcards, parametric search, sorting by field as well as relevance, and over 250 file formats. It has access to most of the other features of the Inktomi Enterprise Search, including server independence, scalability and multithreading. The API is in Java, using XML sockets or HTTP, and runs initially on Windows NT/2000 and Sun Solaris.
Inktomi Enterprise Search version 4.4. UpdateNow recognizes more language, allows deactivation of stemming and provides improvements in XML indexing administration.
PicoSearch: Search Service UpdateNew features include stemming, relevance rank controls, date display options, with timezone calculations and date sorting, dynamic search reports.
XYZFind to be acquired by InterwovenInterwoven, a leading content management application developer, has announced that it's buying XYZFind to improve access to relational database content and XML.
KSearch: New Perl Search Engine
KSearch indexes local files and directories, provides both simple and advanced search with highlighting of matched terms. Perl has been tested on Unix and Windows, can handle sites with thousands of pages.
AltaVista Enterprise Search 2.0: Search Engine Update
New version includes document-level security checking so the search results only show authorized documents, and an open architecture with modular access to the processes and XML interfaces. AutoCategorizer module combines rules and linguistic analysis for classification of documents into enterprise taxonomies.
Recommendations for Search Results: SearchTools Report Update
Search engines use statistical and lexical analysis to match query terms to indexed text, but sometimes human judgment is more effective. Most of the top search engines now allow administrators to designate recommended pages or links to respond to common queries -- this is sometimes called "Best Bets".
Lycos InSite Pro: new search service report
Remote search hosting from Lycos uses the FAST Search engine, checks for updates every two days. Provides simple customization, reports on indexing and search, automatic inclusion in Lycos public search engine.
WizDoc: new search engine report
Concept-oriented search engine for wider recall also has a simple Boolean query option. It can segment documents by topic and show the relevant portions in search results.
Webinator Version 4: search engine update
New version of this powerful search engine brings a browser admin console, no longer limited to command lines and config files. Now supports cookies, meta robots tags, iframe tags, searching by date and within categories. A unique feature "Remove common" tells it not to index text that's the same on two or more documents, so that the index does not store navigation text before or after the content text.
RetrievalWare Version 7
New version of this enterprise search engine includes video search integration, while image and audio search are available through consulting. Cross-lingual searches based on an internal semantic network for English, French, German, Spanish, Italian and Dutch allow queries in one language and results in another. Categorization tools with multiple level taxonomies, can use explicit queries, fuzzy logic (good for ignoring OCR errors) and overriding Boolean control: the same functionality is implemented for alerting and filtering. This version adds security and authentication interfaces for third-party proxies and multiple depositories, and APIs for COM, ASP, and XML queries.
MondoSearch Version 4.4 Released
MondoSearch, a Windows search engine and remote search service has shipped version 4.4, including Unicode text storage, many more languages, User Authentication with Microsoft Content Manager, additional relevance rank tools, and graphic results options.
Microsoft to Address Windows Desktop Search Problems
A report from CNET indicates that Microsoft is planning to convert all Windows data storage to an internal database running on technology from SQL Server. This would include Microsoft Office applications, databases, Outlook email, images and video. Theoretically, it would allow one search to access data in all formats. Apple has a lightweight version of this in the Sherlock Find-By-Content utility, which indexes text in many binary file formats on Mac OS 9 and Mac OS X.
Boxes and Arrows
A wonderful online magazine and discussion forum for information architecture, designed for practitioners interested in the vital issues of the craft.
New Search Developers Mailing List
Search Tools editor Avi Rappoport is moderating a new mailing list for search engine developers. Topics for discussion include open source code options, robot spidering and web crawling, index compression, file format conversion, metadata indexing and searching, Boolean and Intranet search operators, index speed and size, stopwords, relevance ranking algorithms, stemming, categorized search results, search form and results page user interfaces, search log analysis, security, peer-to-peer search, and anything else that interests the participants.
AltaVista Desktop (Windows): New Search Engine
AltaVista is now shipping a desktop version of their corporate search engine, which indexes over 200 file formats and email. This allows employees to locate local content much as they find web pages or Intranet documents. Features include control over which files to index, scheduling and formatting of results. Desktop search competitors include dtSearch and Enfish.
Google Search Appliance: New Search Engine
Google has announced a search engine hardware-software combination product, providing search functionality within corporate networks and firewalls. This complements their ASP Search Service, which is available to any site with Internet access. The Search Appliance provides many of the features of their web site, including the robot spider, PageRank indexing and searching, caching, spellchecker, language identification, and hit highlights within search results snippets. The search engine browser admin has many options for specifying sites and subdirectories to crawl, extensive indexing reports, passwords, server-side XSLT formatting of search results and search log reporting. Pricing is $20,000 for a 1U system (up to 150,000 pages) and $250,000 for an 8U system with additional load balancing features that can index millions of pages. This is a good solid offering that gives other high-end search engines a run for the money, especially at larger scales.
Copernic Indexer: New Search Engine
Copernic, a long-time expert in metasearch, has announced the Copernic Indexer corporate search engine. It can index local servers, web and intranet sites and Exchange servers. Advanced search options include zones and file information, scalable to multiple servers.
Northern Light Acquired by divine
The Northern Light search engine company was acquired by divine, which will integrate the software and services with its content management system. Northern Light recently closed its public web portal and partnered with Yahoo! to provide premium research information. Divine plans to target the technology mainly at the enterprise market.
FusionBot: Search Service Updates
New features include interface adjustments, date sorting and manual adjustments of relevance ranking for specific searches.
S.L.I. Search: New Search Engine and Remote Search Service
The developers of GlobalBrain search technology have acquired it back from NBCi and are now supporting it directly, as both software and search services. Features include learning from user behavior, vocabulary suggestions, metasearch combining results and customizability via XML search results.
Frequent updates add Japanese and Romanian language support, indexing control, displaying web sites as results categories, and Mac OS X compatibility.
Now deployed by Washington State for its government portal, purchase of the Octopus engine should integrate with order-tracking, CRM and ERP databases.
Bravo Classification Tool
Gathers feedback from end-users to update taxonomies and create topic maps.
Analyst Reports on Site Searching
New reports from Forrester and UIE on problems and solutions for site search, mainly on e-commerce sites. The Forrester report clusters search engines by function, separating products, fulltext information and service (customer support).
Usability Test on the Best Number of Search Results Per Page
Academic tests compared search results with 10, 50 and 100 links per page. Found that short and mid-sized pages performed best, that very long scrolling pages were harder to use, and users dislike them.
Answers vs. Search Results: SearchTools Report
How search engines can incorporate adjustments to the search results relevance rankings to provide more satisfactory answers to the common search questions.
Inktomi Search Update - version 4.3
Update provides "Quick Links" recommended results for popular searches, integration with content management systems and improved database indexing interfaces.
W3C XQuery 1.0 updates
New drafts for XQuery 1.0, use cases (with a nice Full-text Search example section), and integration with XPath 2.0.
SearchTools News RSS feed (validates as RSS)
Site change details
Yes, I suppose you might call this a blog. For earlier news, see the 2001, 2000, 1999 and 1998 news archive pages