Huge Stopword List: What Were They Thinking?
The MediaWiki search defaults to excluding 547 words as stopwords. But they're perfectly good words (you can see them below). It's a MySQL full-text search default, and the MediaWiki people have never changed it. Exactly like the short words in the previous page, these words are not indexed at all, so can never be retrieved by the search engine. Stop words include: able, about, above, according, across, actually, after... So a site search containing only one or more of those words has "No page text matches", even when there are pages with those words.
Example at knoppix.net, tried the seven stopwords above, not one match
This message is not just unhelpful, it's misleading. It doesn't even say which of the search terms are stop words, so there's no way to tell except trial and error (or looking at the list). But, contrary to the message, specifying a search with an allowed word and a stopword or two, such as surprise from behind will match all articles containing the word surprise, without checking that the article also includes from and behind. Whoops.
There's a wikimedia meta help page with the awkward title of, Common words, searching for which is not possible. I find this all pretty user-hostile, and I think it stinks.
The main Wikipedia removed stopwords from search in February 2006. They don't say exactly why, though I find it blindingly obvious. But the MediaWiki installation still uses the giant stopword list. To fix it, reconfigure MySQL, or try the procedures some nice user has posted. Reduce the stopwords list to reasonable minimum (the, a, an, and, or, not), or leave it out altogether. Or switch to Sphinx or MWSearch (Lucene) which have fewer stopwords and can be set to the six above).
Arguments? Questions? Comments? Have you tried to search for a word that should be findable? I'm curious about how this has affected people: please leave a comment on my blog.
Next: Extremely limited search syntax and functionality
For Reference: The MySQL Full Text 5.x default stopwords, as of October 13, 2008.
a's able about above according accordingly across actually after afterwards again against ain't all allow allows almost alone along already also although always am among amongst an and another any anybody anyhow anyone anything anyway anyways anywhere apart appear appreciate appropriate are aren't around as aside ask asking associated at available away awfully be became because become becomes becoming been before beforehand behind being believe below beside besides best better between beyond both brief but by c'mon c's came can can't cannot cant cause causes certain certainly changes clearly co com come comes concerning consequently consider considering contain containing contains corresponding could couldn't course currently definitely described despite did didn't different do does doesn't doing don't done down downwards during each edu eg eight either else elsewhere enough entirely especially et etc even ever every everybody everyone everything everywhere ex exactly example except far few fifth first five followed following follows for former formerly forth four from further furthermore get gets getting given gives go goes going gone got gotten greetings had hadn't happens hardly has hasn't have haven't having he he's hello help hence her here here's hereafter hereby herein hereupon hers herself hi him himself his hither hopefully how howbeit however i'd i'll i'm i've ie if ignored immediate in inasmuch inc indeed indicate indicated indicates inner insofar instead into inward is isn't it it'd it'll it's its itself just keep keeps kept know knows known last lately later latter latterly least less lest let let's like liked likely little look looking looks ltd mainly many may maybe me mean meanwhile merely might more moreover most mostly much must my myself name namely nd near nearly necessary need needs neither never nevertheless new next nine no nobody non none noone nor normally not nothing novel now nowhere obviously of off often oh ok okay old on once one ones only onto or other others otherwise ought our ours ourselves out outside over overall own particular particularly per perhaps placed please plus possible presumably probably provides que quite qv rather rd re really reasonably regarding regardless regards relatively respectively right said same saw say saying says second secondly see seeing seem seemed seeming seems seen self selves sensible sent serious seriously seven several shall she should shouldn't since six so some somebody somehow someone something sometime sometimes somewhat somewhere soon sorry specified specify specifying still sub such sup sure t's take taken tell tends th than thank thanks thanx that that's thats the their theirs them themselves then thence there there's thereafter thereby therefore therein theres thereupon these they they'd they'll they're they've think third this thorough thoroughly those though three through throughout thru thus to together too took toward towards tried tries truly try trying twice two un under unfortunately unless unlikely until unto up upon us use used useful uses using usually value various very via viz vs want wants was wasn't way we we'd we'll we're we've welcome well went were weren't what what's whatever when whence whenever where where's whereafter whereas whereby wherein whereupon wherever whether which while whither who who's whoever whole whom whose why will willing wish with within without won't wonder would would wouldn't yes yet you you'd you'll you're you've your yours yourself yourselves zero
previous reason: Ignores all search words shorter than four letters || next reason: Extremely limited search syntax and functionality
<< Back to MediaWiki Site Search Stinks overview
Page updated: 2008-10-16
Search Tools Consulting's principal analyst, Avi Rappoport, may be available to help you with selection, analysis, user experience, and functional search engine work. Please contact us with your questions, comments, or possible consulting discussions.
SearchTools.com - Copyright © 2008-2009 Search Tools Consulting.
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.