As of January, 2012, this site is no longer being updated, due to work and health issues
See the News page for more recent information
Remote Search Hosting Services
Remote searchers will crawl your site and store your info in an index on their server. When someone enters a query in the search form on your site, the link points at the search engine host. It receives the query,does the lookup in the index, formats the results, and sends them back in an HTML form with links directly to the pages on your site.
Some services provide simple, straightforward searches, while others offer powerful advanced search functions such as proximity operators (NEAR) and date-range searching. Some are free and supported by advertising, others charge by the page, by the search or monthly.
Remote search services do not require any programming or local access to the web server. They act like standard robot spiders, following links on your site, rather than using the local file system. For hosted sites with limited server access, these services are an excellent option.
We expect to have a full comparative report on remote site hosting services in February: in the meantime, you can try them out on our search test page.
Source Libraries for Search Code
Many of you may be interested in adding searching and text retrieval to your own applications. We've started a listing page for search source code, beginning with the excellent Findex package from LexTech and the dtSearch source package. You can also look at the open source search applications (listed on the same page), but be sure to check the license terms before including the code in your own programs.
New & Updated Site Search Tools Reports
- Boolean Search - new version of the flexible and powerful Mac web server plug-in.
- JObjects QuestAgent - Pure Java index and search tool. (Also works for cross-platform CD-ROMs)
- Folio siteDirector - provides enterprise web publishing and searching for legacy data
The ACM's Computer-Human Interface group (SIGHI) in the SF Bay Area will sponsor a presentation on Interface for Web Site Search and Navigation on January 12, 1999 at Xerox PARC. SearchTools' Avi Rappoport will be discussing the most important issues in designing a web site search interface, and the architectural and site navigation. For more information, see BayCHI.org.
Knowledge Management Stocks Go Wild
Verity has reported a third quarter of profits, Open Text is doing a deal with Nortel after failing to take over PC Docs/Fulcrum, which is partnering with ChiliSoft to integrate Chili!ASP and CyberDOCS. And all their stock is up (today at least).
Take the Search Tools survey and help us get a picture of your web site -- and why you are (or are not) installing site search tools. Intranet sites are welcome too. It's a chance to explain what you have found that works and what doesn't. We'll write up a big article in February about the results, and will send you a copy of the report directly if you so choose.
Add search to your site CNET Builder.com, November 17, 1998 by Avi Rappoport
A comprehensive article by our own Avi Rappoport, covering background information, choosing the right site search engine for your site, testing and installing the software, designing the interface, and setting up a maintenance program. Includes many links to sites and products, summaries of some of the most attractive programs and an example of the installation process using the DNA Files web site.
Other new articles include Inter@ctive Week's Perfecting Corporate Search Engines, useful tips from chami.com, and A review of robot based internet search services (1996).
We recently separated the Overviews page into individual pages for Books and Articles, Links, Newsgroups & Mailing Lists, and Training and hope you find the organization useful.
As more sites present non-English text, site search engines must index and search these pages effectively. This ranges from handling extended characters (such as those in cîté and søk), through non-Roman character sets and even searching data in languages different from that of the original query. For links to resources, see the new Cross-Language Information Retrieval section.
Metasearch is the process of accessing multiple search engines at once, and presenting the results organized in a useful way. In addition to metasearch web sites and client applications, some site search programs can provide this service. See the new Metasearch section in webwide search page for details.
Results of an Intranet Journal survey and an ht://Dig survey are now on the Surveys page.
We will be doing our own survey shortly -- watch for a special announcement!
New Site Search Tools
- FreeFind - another remote indexing service, free (with advertising)
- Selena Sol's Keyword Search - a classic, very basic Perl CGI script, works almost everywhere.
- Intermediate Search and Xavatoria Indexed Search Engine (Perl CGI scripts) - more complex than Selena Sol's: the Xavatoria version even shows meta descriptions if available.
- Alkaline - free to noncommercial sites, handles accented (extended) characters.
- ISearch - free, open source, library-oriented.
- IDKSM - Java CGI/Servlet that can also index CD-ROMs
- The Infonortics Search Engines and Beyond conference for 1999 will concentrate on "Developing efficient knowledge management systems". Industry speakers include Ramana Rao of Inxight, Danny Sullivan of SearchEngineWatch, Rick Kenny of Fulcrum, John Snyder of Muscat, Mark Krellenstein of Northern Light, Dan Miller of Ask Jeeves, Ellen Voohrees of NIST/TREC. There will also be a number of academic and research speakers as well.
Y2K and Site Search Tools
- Most search engines will not have any trouble in the year 2000: they will not fail because, in most cases, they do not depend on date comparisons.
The most serious likely problem area is index updating. Some automated compare the last update date and time to the current date and time to decide if they should run again. These may have trouble in the year 2000, if the programmers did not store the date as a four-digit number. In that case, the indexer could get confused every time it encounters a file modification date seemingly in the future (for example, the program thinks it's the year 1900 but the file was modified in 1998). The discrepancy could cause the indexer to re-index rather than updating, which could significantly affect server performance.
- Other possible problem areas include administration and search log code, which may also have difficulty if features depend on two-digit years. In addition, searching on date ranges will not work if the file modification dates are stored in the index with two-digit years.
You should check the documentation, code (if available), the developer's web site, support mailing list or newsgroup first, but if this issue is not covered, conduct your own tests or contact the company before the end of 1999.
Ultraseek to Index and Search XML
According to a story in Wired News, Infoseek's site search tool, Ultraseek version 3.0 will support searching XML. The product is due for release next Tuesday (November 17, 1998). Tim Bray, co-editor of the W3C XML standard, welcomed the news cautiously, warning that implementing high-volume and high-performance search of structured text is extremely difficult.
Site Search Panel at Builder.Com Live - December 7-9, 1998, New Orleans
The panel will consist of Avi Rappoport of Search Tools Consulting, Louis Rosenfeld, author of Information Architecture for the WorldWide Web, an excellent book on site design and information architecture (including searching) and Jakob Nielsen, of the Nielsen Norman Group and the UseIt web site. We'll be covering various aspects of choosing, installing and improving site searching, and hope to see you there.
MondoSearch (new search tools)
New product offers automatic categorization (Yahoo-style directories), frame-handling, indexing of pages generated dynamically from databases and other sources, audio and video search, multilanguage filtering, and more.
PicoSearch and NetCreations PinPoint (new Remote Indexers)
New remote index and search engines index your site and store the results on their server. When a site visitor enters their search word or phrase and presses the search button, the remote server application receives the data, performs the search and returns the results. Try them out on our search page.
New Versions of Search Tools
Thunderstone Webinator version 2.5
New version includes optional metasearch: searching on multiple webwide search engines. Webinator is also available as a free remote indexer -- we have an example running.
Ultraseek - New Version 2.1, User Group Meeting
Ultraseek version 2.1 includes speed improvements, date range searching, indexing of documents on SSL (https) servers, indexing of newsgroups, XML tag support, distributed indexing and robot spider cooperation, more language support (Swedish, Norwegian and Danish added to French, German, Dutch, Spanish, Italian, Portuguese, Japanese and English), among a number of other features and bug fixes.
[The user group meeting in New Orleans has been postponed]
Phantom - New Version Announced
Maxum has recently announced the 2.2 version of the Phantom search tool, which will have PDF indexing, meta tag indexing, more results customization, and several nice administration features.
Quadralay WebWorks Search
WebWorks Search version 2.0.7 is available for download on Windows 95/98/NT. This update release adds support for IIS 4.0.
Site Search Installation Example: US Department of Education, Cross-Site Indexing Project 1997
Another good example of the process of choosing and installing a site search tool, in this case covering several Education Department sites. The group set up a requirements document, and tested Netscape Catalog (later replaced by Compass Server), InQuery, Verity Search '97 and Ultraseek, which they ultimately chose.
New Book:Web Developer.com Guide to Search Engines
- A wide-ranging book covering everything from the beginnings of the robot spiders crawling and indexing the web to analysis of the major webwide search engines to detailed information on installing and configuring six local site search tools. The programs covered are AltaVista Search Intranet, Excite for Web Servers, Harvest, ht://Dig, Phantom and Ultraseek. Also describes BDDBot: An ongoing collaborative project, to create a Java web server and search spider, using open source under the GNU public license. Use the following links to buy from Amazon or Computer Literacy and you'll support this site.
New Book: Web Navigation: Designing the User Experience
- New book on designing web site navigation, from the simple to the complex enterprise site. Use these links to buy at Amazon or Computer Literacy and support this site.
dc:DC - The Sixth Dublin Core Metadata Workshop
The Dublin Core is a simple set of metadata for describing web pages, such as the copyright information, subject, language and so on. The workshop will focus on practical implementation and interoperability. The meeting is in Washington DC, November 2-4, 1998.
See the Conferences page for other meeting information.
Search Administrators Information
- Many new Glossary entries for searching and search tools terms
- Types of Site Search Engine applications
- Indexer Information, comparing the Local File and Robot Spider varieties.
Search Tool Product Information
- Updated Reviews page.
- Information Access ITMS - new product report
- Integrated Intelligence I-Search - new product report
- Excite for Web Servers (EWS) is now out on Linux.
SearchTools News - August 29, 1998
InQuizit Product Report
New product promises true natural-language processing for site searching. A test version should be up at their web site shortly.
Domino Extended Search Product Report
Allows Domino servers to provide search access multiple Notes, ODBC and webwide search databases simultaneously.
Migration Path from PLS to Thunderstone Texis
Thunderstone Software will provide price discounts and support for customers who want to migrate from PLS to the Texis integrated relational database and search system. As reported here, PLS was bought out by America Online and the products are now shareware.
XML Query Language Proposal
The XML-QL, Query Language for XML proposal describes a query language approaching an XML file much like searching a database with SQL, rather than a free-text document. The focus of this proposal is EDI (Electronic Data Interchange) data as opposed to a library or information retrieval approach. It provides examples using specific data and element patterns and constructing new results listings.
RDF Revision Posted
- W3C RDF Model and Syntax Specification
- W3C RDF Schema Specification
- The Dublin Core group is now planning to use RDF as a carrier syntax
Extended listings of information retrieval, and related conferences on new page.
Thunderstone Webinator Remote Search
An alternate search option for this site, provided for us by Thunderstone. You can compare it with the Phantom and SearchButton search engines, and more options will come soon.
XML Search Tool Announced
- The BUS (Bottom-Up Scheme) search engine indexes XML and SGML text and recognizes document hierarchies and structure.
Search Tools Product Information
- dtSearch - a well-regarded Windows text retrieval engine which also has a web version.
- Dataware BRS/Search - enterprise knowledge management suite
- ZyIndex - text and image indexing engine
- SearchKey Plus is the new name for NetResults: Innotech Multimedia has also changed its corporate name to ASTAware.
New Articles and Links
- Search Engines for Intranets Information Today, July 1998 by Nina Platt
- Search Usability: Search and You May Find Alertbox column, July 16, 1997 by Jakob Nielsen.
Very helpful results of research on web site usability finds that half of all users are "search-dominant" (they use the search field as soon as they can). Recommendations are to put a search button on every page, to index the entire site rather than selected documents, and to avoid requiring Boolean operators in the default search.
- SPEX: Evaluation Kits: Web Search Engines SPEX, 1998
- New information on Distributed Indexing including the TF-CHIC Library of Distributed Indexing-Related Documents and Sites.
- Additional information on searching Adobe Acrobat PDF files.
Web Server 4D Site-Search Product Report
New SearchButton Search
An alternate index and search option for this site, provided for us by New Idea Engineering. You can compare it with the Phantom search engine, and more options will come soon.
Verity Corporate News
- A columnist at The Street.com, Herb Greenberg, had an article about Verity in the premium (paid) section of thestreet.com dated July 15, 1998, which was partially reprinted in the San Francisco Chronicle.
Infoseek's Java Search Engine Project
Infoseek is planning to create a very configurable Java search tool with source code included. The idea is to provide it as an add-on to databases such as Oracle, to browsers, email programs and the desktop. Due for developer release in July, final release in December (page apparently last updated in May, 1998). Other Java Search Tools are already available.
- We've added prices and platforms compatibility information for most of the products in our Tools sections. More information to come as our databases come online with the new server in the next couple of weeks.
- Added Tecumseh Scout product page.
- New Perl listings, including Matt's Simple Search.
- Mac servers get more search tools MacWEEK, July 13, 1998 by Avi Rappoport
Description of the benefits and issues of site search tools, including indexers vs. crawlers. Covers the features of iHound, Boolean Search, Phantom and WebSTAR Search.
- "Site Search Tools for Mac Web Servers", Print-only article in the newest issue of Net Professional Magazine, volume 2, number 1, by Avi Rappoport.
Searching PDF (Adobe Acrobat) Files
- Important information on serving PDF files, including weaknesses in online interaction, considerations for searching, and search tools which can index and search the format.
Web Admin's Guide to Site Search Tools Updated
- Updated the Selection section, including an excellent example of a site search requirements, analysis, selection and installation process from the University of Pennsylvania.
- Added information on preparing your site for search.
- Added Maintenance and Search Log Analysis sections
SOIF and RDM
- The Summary Object Interchange Format and Resource Description Message mechanism are designed to allow indexes to work together and update as needed, rather than forcing search indexers to re-crawl each site redundantly. This can improve site searching of multiple very large sites.
Search Tools Product Updates
- Magnifi Enterprise Server 2.0 indexes text, video, audio, images and other formats, provides thumbnail previews as search results.
- Muscat is interesting search technology from the UK. It's the basis for EuroFerret, the European web search service, and is available for free for 1000 pages or fewer.
- Version 8 of Open Text LiveLink Intranet was released this spring. Computer Reseller News reviewed this new version in a large document-management comparative review, and gave it the Editor's Choice Award .
- New info for SWISH, NetResults, WebGlimpse, Harvest, AltaVista, Excerpt, and ht://Dig.
Search Notes from Web Design '98 in San Francisco
- Several tracks on Information Design, Web Site Navigation and Usability, all touching on search issues. One of the speakers was Lou Rosenfeld, author of Information Architecture for the World Wide Web (a favorite book of ours).
- Not too many vendors exhibited at the show -- it was substantially smaller than Software Development, InternetWorld last year or Macworld.
- The only search tools product developers exhibiting were InfoSeek, Microsoft (they didn't even show Index Server), and IBM/Lotus.
- New Web Navigation book by Jennifer Fleming, coming soon from O'Reilly, looks very useful for site organization design.
- Concentric Networks has announced a Virtual Development Environment which will allow hosted sites to run CGIs in a safe "sandbox", without causing performance, stability or security problems for other sites on the same machine. This will include many search tools, although they are currently testing the processor load for indexing. Pricing will be by CPU unit used.
- IBM announced their WebSphere package: an application server, Apache HTTP server (in addition to Domino Go) and NetObjects scripting. They are not shipping a site search tool, though said they are discussing a bundle with several vendors.
- The IBM/Lotus Domino Go web server, version 5.0, will no longer include the Verity search code, according to a Lotus/IBM person at the conference. They are switching to an inhouse system created by IBM Palo Alto.
- Much attendee interest in XML: the XMLU sessions in the exhibit hall were very crowded, and it wasn't just because they had chairs.
Disney has just acquired 43% of Infoseek, makers of the Ultraseek local site search tool [bought in June, 200 by Inktomi -ed.] in a complex deal that involves Paul Allen's Starwave, which produces original interactive programming on the Web (whatever that means).
Informix and Musclefish have created an audio datablade called Musclefish AIR. This datablade extends the functionality of your Dynamic Server and is ideal for users who generate and work with audio files. (from the IntraWare mailing list).
User Interface Engineering, led by Jared Spool, is presenting three courses on usability and design this summer, including Web Sites That Work.
PLWeb and all PLS products are now freeware from AOL, but no custom development, training or support is provided. This is the executable object code, rather than the source code, although the Perl scripts can be modified. The license allows royalty-free use, though you must include a "Powered by PLS" notice with a link to AOL.
This site was the Cool Tool of the Day yesterday, and we're happy to welcome everyone who came from the cooltool site. The review indicates we're doing things right, and warmed the cockles of our hearts.
We have some articles forthcoming in Net Professional Magazine, including one on Web Site Search Tools for Mac WebServers.
Lycos, Inc. has announced that they have received a patent on spider technology. They are claiming exclusive rights to "automated software robots which index the Web and collect targeted information from millions of different Web sites around the world...". The announcement stresses the patent's importance in recognizing Lycos as a Web pioneer, but contains no information on whether Lycos will attempt to enforce exclusive rights or require licensing from the many other webwide search engines which use this technique. Reuters reports that Lycos previously said it would defend patent rights aggressively. Danny Sullivan of SearchEngineWatch thinks that other search crawlers use sufficiently different technology that they would not infringe on the Lycos patent.
Added a Thunderstone Webinator page.
Created product lists by platforms: Java, Mac, Unix, and Windows in additional to the alphabetical listing.
More links for intranets and knowledge management.
New Host: www.searchtools.com is up!
New Search: site is now indexed and searched by the Phantom engine!
Ultraseek wins 1998 Network Computing Well-Connected Award for Intranet Search Engine
The editors were particularly impressed with Ultraseek's natural language interface, administration and search results.
Inktomi is selling their engine as a site search tool, but I can't find any articles or reviews.
New Metadata page.
I've been reading the Information Architecture book written by the Argus folks and it's great -- I wish I'd written it myself. This covers more than just site search tools, it goes back to basics of site design, scalability, navigation, coherence, maintenance and so on. It makes you think about the web site as a whole, rather than as separate pages or even sections. Highly recommended! You can buy it online.
New Ideas Engineering
Company provides consulting and training on Verity and other search tools, including the Guerilla Verity Class.
Rearranged the tools pages so all tools with actual information get their own pages, also added platform compatibility icons to the Tools page and added a Related Topics section.
- The Scent of Information
- Jared Spool of User Interface Engineering reported on a study they conducted last year about locating data on web sites. They took a range of people, from non-computer users to experts, and had them look up data on certain web sites, such as C|Net and Car Talk. The subjects had several questions to answer with a limited time for each site, but knew the data was there somewhere. Among other navigation problems, the UIE group discovered some serious difficulties interpreting the results of site searches, especially cryptic results lists such as coin ID numbers, and the desire of users to incrementally improve searches by whittling down the results, rather than reissuing a search.
Cataloging Web Sites
- Netscape's Information Architect, Ira Kleinberg, gave a presentation on his work in setting up a context and meta-structure for their web site, using the Dublin Core tags as a base. He recommends cataloging the sites to improve navigation, version control, ownership rights and reusability of information.
XML, DHTML, DOMs, CSS, and other acronyms.
- A constant topic within the conference, Brian Travis of XMLU gave a good presentation about XML.
RDF (Resource Description Framework) was very big. Based on the MCF work done at Apple, RDF provides metadata, information about information. It will allow better navigation within sites, agents to exchange data between sites. Netscape/Mozilla is experimenting with automatic site maps and improved bookmarks using this format, and it's a proposed W3C standard.
Navigation and Visualization
- Earl Rennison from Perspecta Systems showed some interesting interfaces providing feedback and context in site navigation/search. For an example, see AllTheNews.
Note: both domains have disappeared by September, 1998
- Brewster Kahle showed his most recent version of the Alexa search toolbar. While it's not a site search tool, it's really cool. There are several hundred thousand users and their data tracks are providing helpful feedback to others. The Mac version is in test.
For more news, see the Current News Page and the 1999 News Archive.