Search Indexing: storing data
-
Many file formats
-
PDF, MS Office formats: Word, Excel, PowerPoint, XML, others.
-
Watch out for product updates, new formats
-
Multiple languages
-
Stemming algorithms and synonyms are language-specific
-
Non-Roman characters require special processing
-
Security
-
Track security levels and groups in web page listing
-
Consider indexing by security level in different collections
-
Index exhaustively
-
Whole document (even if very long)
-
All words, don't exclude stopwords
-
Word position for phrase searching
-
Field or meta tag, if any
Start | Prev | Next
Internet Librarian 2001
November 8, Pasadena, CA
Avi Rappoport, Search Tools Consulting
www.searchtools.com