As of January, 2012, this site is no longer being updated, due to work and health issues

SearchTools Survey - July 2002

File Formats and Search Engines

HTML is the basic file format of the Web, but we found that half of the sites in the survey are serving some files that are neither HTML or plain text. Many sites serve cross-platform standard formats such as PDF, PostScript and XML, while others serve office productivity files, including Microsoft Word, PowerPoint, Excel and WordPerfect.

There was some confusion among our survey respondents about file formats: some noted that they serve pages generated by server-side processing (JSP or ColdFusion) or by backend databases. Most site search engines can handle these because they are HTML pages by the time they reach the client, whether it's a browser or a robot indexing crawler. But the formats below are true binary files and cannot be read by browsers.

Some site search engines will index complex file formats: they may serve them by sending them to the client and allowing the browser to launch the creating application or they may attempt to convert them to HTML and serve them in that way.

A few search engines will index image, audio and video file metadata, such as the file name. Virage and Excalibur can index the multimedia data itself, although this requires a significant investment in time and resources.

Formats without search with search
HTML 658 363
PDF 308 227
text 679 167
Word 213 127
PowerPoint 115 100
Excel 85 111
XML 85 83
PostScript 31 38
WordPerfect 12 18
Lotus 1-2-3 7 14
Zip 3 2
Flash 2 2
multimedia 1 2
SGML 0 3
AVI 1 1
QuickTime 1 1
RTF 2 1
Applix 0 1
Brad 0 1
chemical formats 0 1
compressed files 0 1
FFT 0 1
icl 0 1
MODCA-P 0 1
Quark 0 1
RealAudio 0 1
RFT 0 1
StarOffice 0 1
WordPro 0 1
af3 (ABC Flowchart) 1 0
audio 1 0
Domino .nsf 1 0
dot (GML) 1 0
downloading EXE files 1 0
email files 1 0
HKE 1 0
MP3 1 0
MPEG 1 0
PTML 1 0
publisher 98 1 0
VIV (Vivo) 1 0
WAV 1 0
 
July 2002 Survey Results

Sites & Search
 - Why Install
 - Why Not Installed
 - Site Sizes
 - Update Rate
 - Server Location
 - Languages
 - Multilingual Sites
 - File Formats
Ratings
 - Summary
 - Popular
 - Custom
 - Others

This survey is copyright © 1998-2003 by Search Tools Consulting, and all rights are reserved. The survey was designed, analyzed and reported by Avi Rappoport. Personal information in the survey will be kept private at all times. For reprint permissions or survey aggregate data purchase, please contact Search Tools Consulting.

Home Guide Tools Listing News Info Search Contact

Avi Rappoport of Search Tools Consulting can help you evaluate your search engine, whether it's on a site, portal, intranet, or Enerprise. Please contact SearchTools for more information.


Creative Commons License  This information copyright © 2000-2011 Avi Rappoport, Search Tools Consulting. Some Rights Reserved, under the Creative Commons Attribution-Share Alike 3.0 United States License. Always attribute copied content to the page's full URL. Permissions beyond the scope of this license are available upon request.