Search Tools Consulting
Wikipedia, and particularly the related sites running the software released as MediaWiki, have some of the worst site search I have ever seen. The default installation's query processing is absurdly limited, the retrieval is crippled by bad settings, the relevance is unclear, and the results page is not just ugly but contradictory and confusing.
I will be posting more detailed analyses supporting each of these statements, linked from this page.
Default versions of the wikimedia search engine are very nearly unusable. If you have a MediaWiki, check the page Special:Version. If there is no mention of a search plugin, then run, do not walk, to replace the site search module. At least use the MWSearch (Lucene) extension, a version of which is used on the main wikipedia, or, better, the Sphinx search extension (which powers the New World Encyclopedia search). Then analyze the functions and interface, looking at my critique, and adjust accordingly. Your wiki readers will thank you.
What MediaWiki Default Search Does Wrong (Let Me Count The Ways)
- Ignores all search words shorter than four letters
- Ignores all words on a ridiculously long stopword list
- Extremely limited search syntax and functionality
- Misleading results information
- Displays markup codes in search results
- Match word highlights are wrong
What the Wikmedia Default Search Does Right
I like to be fair and see whatever good is in a search engine. The main advantage of MediaWiki using the MySQL internal full text search engine is that every change in the articles is immediately sent to the search indexer. Here are the other things it does right:
- Searches the articles by default, with advanced options to search discussion and help pages
- Is case-insensitive so search terms match mixed-case text (usually).
- Defaults to retrieving only articles matching all search terms, rather than any article matching any search term, which seems a reasonable choice for a wiki site.
- Recognizes that quote marks around a set of search terms indicates that it should match the exact phrase
- Matches search terms with diacritic and non-Roman characters, for example ç (c cedilla) and π (pi). That is, unless they're expressed as the HTML character entities such as ç and π in which case, they aren't matched. Better to convert everything to Unicode, and make it searchable.
- Sorts the results in two parts: a set of title matches, and a set of body text matches in a reasonable relevance order (usually)
[With acknowledgement to Jared Spool for the term "Site Search Stinks".]
Arguments? Questions? Comments? I'm interested in other people's experiences, and may be able to deconstruct problems with the search.
Page updated: 2008-10-24
Search Tools Consulting's principal analyst, Avi Rappoport, may be available to help you with selection, analysis, user experience, and functional search engine work. Please contact us with your questions, comments, or possible consulting discussions.
SearchTools.com - Copyright © 2008-2009 Search Tools Consulting.
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.