Indexing: Organizing Data
-
Plain Text only
-
Not graphics, even if they include text
-
Not Java or JavaScript
-
Recognizing HTML Page Structure
-
Title (make sure they are descriptive)
-
Meta Descriptions often used as summary
-
Meta Keywords
-
Headers
-
Alt Tags
-
Other Structures
-
Metadata (Dublin Core et al.)
-
Proprietory Tag Fields
-
XML Tags
-
Database Fields
-
Beyond English
-
Multilingual sites need extra planning & testing
-
Index & Search must handle diacritical characters ("thé"
is not the same as "the")
-
Non-Roman character sets are even more complex
-
Stopwords
-
Removing common words (a, an, and, be, not, or, the, to, etc.)
-
Saves a lot of space in the index
-
Some searches become impossible ("to be or not to be")
-
Makes phrase searching trickier ("The Power And The Glory")
Start | Prev
| Next
Thunderlizard Web Usability 2000
Seattle, July 21
Avi Rappoport: Search Tools Consulting
www.searchtools.com