Home Guide Tools Listing News Background Search About Us

Search Tools Analysis

Stopword Hell

Why MediaWiki's Site Search Stinks, Reason #2

Avi Rappoport
Search Tools Consulting

Huge Stopword List: What Were They Thinking?

The MediaWiki search defaults to excluding 547 words as stopwords. But they're perfectly good words (you can see them below). It's a MySQL full-text search default, and the MediaWiki people have never changed it. Exactly like the short words in the previous page, these words are not indexed at all, so can never be retrieved by the search engine. Stop words include: able, about, above, according, across, actually, after... So a site search containing only one or more of those words has "No page text matches", even when there are pages with those words.

error message trying to find stopwords

Example at knoppix.net, tried the seven stopwords above, not one match

This message is not just unhelpful, it's misleading. It doesn't even say which of the search terms are stop words, so there's no way to tell except trial and error (or looking at the list). But, contrary to the message, specifying a search with an allowed word and a stopword or two, such as surprise from behind will match all articles containing the word surprise, without checking that the article also includes from and behind. Whoops.

There's a wikimedia meta help page with the awkward title of, Common words, searching for which is not possible. I find this all pretty user-hostile, and I think it stinks.

The main Wikipedia removed stopwords from search in February 2006. They don't say exactly why, though I find it blindingly obvious. But the MediaWiki installation still uses the giant stopword list. To fix it, reconfigure MySQL, or try the procedures some nice user has posted. Reduce the stopwords list to reasonable minimum (the, a, an, and, or, not), or leave it out altogether. Or switch to Sphinx or MWSearch (Lucene) which have fewer stopwords and can be set to the six above).

Arguments? Questions? Comments? Have you tried to search for a word that should be findable? I'm curious about how this has affected people: please leave a comment on my blog.

Next: Extremely limited search syntax and functionality


For Reference: The MySQL Full Text 5.x default stopwords, as of October 13, 2008.

a's able about above according
accordingly across actually after afterwards
again against ain't all allow
allows almost alone along already
also although always am among
amongst an and another any
anybody anyhow anyone anything anyway
anyways anywhere apart appear appreciate
appropriate are aren't around as
aside ask asking associated at
available away awfully be became
because become becomes becoming been
before beforehand behind being believe
below beside besides best better
between beyond both brief but
by c'mon c's came can
can't cannot cant cause causes
certain certainly changes clearly co
com come comes concerning consequently
consider considering contain containing contains
corresponding could couldn't course currently
definitely described despite did didn't
different do does doesn't doing
don't done down downwards during
each edu eg eight either
else elsewhere enough entirely especially
et etc even ever every
everybody everyone everything everywhere ex
exactly example except far few
fifth first five followed following
follows for former formerly forth
four from further furthermore get
gets getting given gives go
goes going gone got gotten
greetings had hadn't happens hardly
has hasn't have haven't having
he he's hello help hence
her here here's hereafter hereby
herein hereupon hers herself hi
him himself his hither hopefully
how howbeit however i'd i'll
i'm i've ie if ignored
immediate in inasmuch inc indeed
indicate indicated indicates inner insofar
instead into inward is isn't
it it'd it'll it's its
itself just keep keeps kept
know knows known last lately
later latter latterly least less
lest let let's like liked
likely little look looking looks
ltd mainly many may maybe
me mean meanwhile merely might
more moreover most mostly much
must my myself name namely
nd near nearly necessary need
needs neither never nevertheless new
next nine no nobody non
none noone nor normally not
nothing novel now nowhere obviously
of off often oh ok
okay old on once one
ones only onto or other
others otherwise ought our ours
ourselves out outside over overall
own particular particularly per perhaps
placed please plus possible presumably
probably provides que quite qv
rather rd re really reasonably
regarding regardless regards relatively respectively
right said same saw say
saying says second secondly see
seeing seem seemed seeming seems
seen self selves sensible sent
serious seriously seven several shall
she should shouldn't since six
so some somebody somehow someone
something sometime sometimes somewhat somewhere
soon sorry specified specify specifying
still sub such sup sure
t's take taken tell tends
th than thank thanks thanx
that that's thats the their
theirs them themselves then thence
there there's thereafter thereby therefore
therein theres thereupon these they
they'd they'll they're they've think
third this thorough thoroughly those
though three through throughout thru
thus to together too took
toward towards tried tries truly
try trying twice two un
under unfortunately unless unlikely until
unto up upon us use
used useful uses using usually
value various very via viz
vs want wants was wasn't
way we we'd we'll we're
we've welcome well went were
weren't what what's whatever when
whence whenever where where's whereafter
whereas whereby wherein whereupon wherever
whether which while whither who
who's whoever whole whom whose
why will willing wish with
within without won't wonder would
would wouldn't yes yet you
you'd you'll you're you've your
yours yourself yourselves zero  

 


previous reason: Ignores all search words shorter than four letters || next reason: Extremely limited search syntax and functionality

<< Back to MediaWiki Site Search Stinks overview

Page updated: 2008-10-16

Home Guide Tools Listing News Background Search Contact

Search Tools Consulting's principal analyst, Avi Rappoport, may be available to help you with selection, analysis, user experience, and functional search engine work. Please contact us with your questions, comments, or possible consulting discussions.


Creative Commons LicenseSearchTools.com - Copyright © 2008-2009 Search Tools Consulting.
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.