As of January, 2012, this site is no longer being updated, due to work and health issues
See also: Site change details
As more sites are dynamically generated from database backends, they may discover problems with database search functions. This report describes the limitations of database searching, explains the functional and interface features of text searching, and lists text search engines with database interfaces.
New survey analysis up through the end of October 2000. We wanted to learn more about the relationship of search engines and web sites, and how web site managers view search engines.
Reasons for Installing or Not Installing Search
Of those who have installed a search engine, most cite better navigation and a professional look for the site. Marketing and Customer Service departments are also encouraging site managers to add search engines. For more information, see the report Why Site Managers Install Search Engines.
Of those who took our survey, 61% had not installed a search engine (yet), mainly citing time and complexity. However, more people recognize a need for searching and plan to add it even before the site has been completed. For more information, see the report on Reasons Why Search Is Not Installed.
Searching and Web Site Characteristics
We've found some interesting data about the web sites surveyed. As we expected, sites with more pages tend to have search engines installed, to allow people better access to their information. For details, see Site Sizes and Search Engines. In addition, sites which are updated hourly or daily are much more likely to have search installed than those which update less often: see Search and Update Frequency for details.
Installing a site search tool is easiest on a local server. Administrators working on co-located and hosted servers have a harder time installing site searching, and are significantly less likely to do so, as shown by our results for Server Location.
As in our previous survey, we found that there are many sites with non-English text, and they are more likely to have search engines installed: see Languages on Sites. We also found that single-language sites are much less likely to have search engines installed than multilingual sites: details are in the report Search Engines & Multiple Languages.
Finally, a large number of sites are report serving non-HTML files, including interchange formats such as PDF, PostScript, Flash and XML, and office productivity files such as Word, WordPerfect, Excel and PowerPoint. Even some of the remote services and free software will index these files. For more details, see the File Formats report.
Search engines requirements are more complex and idiosyncratic than they appear at first. A web site may have dynamic pages, or be missing page descriptions, or change often, requiring a flexible indexer to adjust to these conditions. An intranet may have experienced searchers, or complex frames containing binary data recognized by special client modules, while a topical portal may have naive customers who perform many single-word searches. No one search engine is best for everyone, but some have consistently happy customers while others are rated very badly. If you read the comments, you can learn a great deal from the experiences of our survey takers!
Netscape Compass is now the search engine for iPlanet Portal Server.
Our report of the demise of Netscape Compass was mistaken, and we greatly regret the error. In December, 2000, a company spokesperson said:
Netscape Compass Server will no longer be sold as an independent product, although current customers will be supported and upgraded to the latest version, 3.01c. From this point forward, it will be included as part of the iPlanet Portal Server.
The GrapeVINE for Compass product, will be renamed to iPlanet Portal Server: Personalized Knowledge Services. It will work as a module to the iPlanet Portal Server, much the same way GrapeVINE worked with Netscape Compass Server.
Both products should be integrated and available to customers by the first quarter of 2001.
Quadralay WebWorks Search has been discontinued. According to the company, the search engine has not been sold for over a year.
Search Engines: The Hunt
Is On Network Computing Magazine: October 16, 2000 by Avi Rappoport
In-depth discussion of search engines for e-commerce and other web sites covers features and future trends, software vs. services, database vs. text searching, natural-language searching, and open-source search engines. The testing included indexing over 150,000 pages, and covered administration tools, customization, search features, relevance ranking and search logs. Products were Inktomi Search, AltaVista Search, and Excalibur RetrievalWare, services were Atomz Enterprise Search and Searchbutton Corporate. Also includes an email poll of Network Computing readers.
Report: Why Searches Fail
Even the best search engines can't always find what people ask for. Our search log analysis has disclosed what's going on and what you can do about it.
Top Five Reasons Why Searches Find No Matches:
- Empty searches - someone just clicked the Search button or pressed Return
- Wrong Scope - trying to search the whole Web
- Vocabulary Mismatches - used a synonym, terms too broad or too narrow
- Spelling Mistakes
- Query Requirements Not Met - i.e. a required word was not found
Our report, Why Searches Fail, includes details on these and other common causes of search failures. To address these problems, we recommend choosing search engines which are flexible and implement features such as synonym lists, high-quality stemming, spellcheckers, do not use stopword lists, and provide helpful information about which words were not matched. For suggestions on designing and wording the no-matches interface, see our No-Matches Page Guide.
XML & Information Retrieval
We have received a Call For Papers for a Special Topic Issue of the Journal of the American Society for Information Science (JASIS) on XML and Information Retrieval. Submissions are due by December 15, 2000. The CFP is not online at the ASIS site so I have posted a copy here.
CD-ROM integrated searching
A great deal of information is still distributed via CD-ROM or DVD, especially to sites without broadband Internet access. Some innovative companies are combining the archival data on the disc with updates on their Web site to deliver fast and current data. Several Java search engines provide this functionality, including the new Webrom, as well as ASTAWare SearchKey, DocFather, and JHL Search.
Some search tools are abandoned by their developers or are no longer being updated for various reasons. We recommend against installing these because they may not work in new versions of operating systems, on larger hard drives or in future years.
- Magnifi - the company is now doing ASP work
- Cybotics - web site is gone and company can't be reached
- Excerpt - company no longer selling the product
- NetCreations Pinpoint Service - company no longer accepting new customers or providing any support.
- EWS (Excite for Web Servers) - no new versions for several years, not supported.
- Microsoft Site Server - customers report difficulty getting help and we've heard rumors of the entire product being discontinued.
Not many search-oriented conferences in October, but November brings the ASIS meeting and Danny Sullivan's Search Engine Strategies meeting, where I expect to speak about site search and robots.
Inktomi is concluding the integration of the Ultraseek product by renaming it and marketing it along with their Web portal service. Indications are that the technical team is still intact and reasonably content, and is working on the next version.
Lycos announced on June 14 that it is using the Fast Search & Transfer technology to power its search site with over 340 million pages. This is essentially the same technology sold in the FAST search engine.
The Ultraseek search server business unit of Infoseek has announced its forthcoming acquisition by Inktomi. The software was originally designed using many of the algorithms from the Infoseek webwide search engine, but has developed as a standalone product with independent features such as office productivity document format compatibility and a browser-based search administration interface. It is being used in corporate sites, news, e-commerce, portals, universities and Intranets. Sources say that the Ultraseek team will move together to the Inktomi campus, and will continue development of the future versions of the software.
AltaVista has released a new version of their search engine, aimed at large sites, portals, Intranets and E-commerce. Features include a new Java-based search administration interface, database interface using JDBC, robot gathering for multiple sources on separate schedules, scalable architecture, support for 30 languages including Chinese, Japanese, Korean and Arabic. The SDK (code library) provides additional customization of search results structure, sorting on product attributes and other features.
Microsoft has announced a Security Vulnerability with Index Server on Windows NT 4 and Windows 2000. Essentially, any site which runs Index Server can be compromised if the server receives certain erroneous commands. We recommend that everyone who is running Index Server read the FAQ and apply the patch.
The open-source search engine ht://Dig now has ConfigDig: a template-based HTML front end for easy search administration from any browser. This allows remote configuration by search admins who are not expert at Unix command-line interfaces.
Thunderstone's Webinator Remote is a free search service that demonstrates the power of the Webinator search engine. It is moving from thunderstone.com to master.com, which is providing both remote search and additional services based on the Thunderstone Texis database and search engine. All Webinator Remote users should update their sites accordingly.
Atomz and PicoSearch have recently started indexing Adobe PDF (Acrobat) files and tags from MP3 files. However, I'm not sure whether serving and indexing PDF files is really a help to site visitors: see the Searching PDF page for details.
A wonderful set of papers on search results as complex hypertext.
The Fifth Search Engines Meeting, presented by Infonortics in April, covered a whole range of issues, including a lot of information on the webwide search engines, user interface and design studies, salability, application of theoretical IR techniques to practical search engines, and more. Slides are available at the Infonortics site (only really readable in MS IE 5 however).
I will be speaking on Web Search Usability, covering the principals of search interfaces, examples of successful and unsuccessful interfaces, search log analysis, usability testing, and iterative improvements. The full Web Design World Conference is July 17-21, I'll be speaking on the 21st in the Usability track.
We are continuing to gather and analyze information about indexing robots, including some useful tidbits for those who would like to write or commission their own.
Many new listings for conferences on search, information retrieval, information architecture and related topics throughout 2000. Coming soon:
We've heard rumors of new versions of AltaVista Search, Verity, Excalibur, and Fast Search. We will post details of the final releases as we get them.
For earlier news, see the 1999 and 1998 news archive pages
Avi Rappoport of Search Tools Consulting can help you evaluate your search engine, whether it's on a site, portal, intranet, or Enerprise. Please contact SearchTools for more information.
This information copyright © 2000-2011 Avi Rappoport, Search Tools Consulting. Some Rights Reserved, under the Creative Commons Attribution-Share Alike 3.0 United States License. Always attribute copied content to the page's full URL. Permissions beyond the scope of this license are available upon request.