Dealing with Multiple Languages
-
Keyword queries are language-independent
-
Natural-language and concept search are not
-
Handle many character sets
-
Roman diacritical characters: daß, thé, Været
-
Non-Roman alphabets, such as Russian, Arabic, Hebrew, Sanskrit, etc.
-
Asian languages: Chinese, Japanese and Korean
-
Unicode: universal character encoding system
-
Most documents don't include encoding
-
Search indexer must deduce language from text
Previous | Next | Contents
Intranets 2002
October 29, 2002
for more information, see SearchTools.com