As of January, 2012, this site is no longer being updated, due to work and health issues
The Long Tail and Short Head of Search
Search Tools Report
The term "Long Tail" was popularized by Chris Anderson in his book and web site of the same name to describe the successful business model of Amazon and Netflix: offering huge selections of books and DVDs, significantly more than brick-and-mortar stores could ever hope to keep in inventory. Customer demand is expressed by searching, and the Long Tail describes the existence of a surprisingly large percentage of of unique search terms.
"Search term" here means the words that people type before they click the Search button, whether it's one word or a phrase. There are very few popular terms, which are repeated thousands of times a day, even in medium site search. These terms tend to be short and general, such as those described in the Google Hot Trends: iphone, webkinz, heroes, club penguin.
Rich Wiggins of the MSU Library was possibly the first to recognize this pattern in search log analysis, described in his paper The Accidental Thesaurus. Counting the number of times each term is used per day (week, month), and graphing them, he showed that there are a few very popular terms (the Short Head), another set of terms that are repeated quite often (the middle) and a huge number of unique terms (the Long Tail).
This curve is a familiar one to mathematicians and statisticians. It fits into the Pareto principle, also known as the 80-20 rule, observes that in many systems, 80% of the effects come from only 20% of the actions. Zipf's law, which came from counting words used in an archive of text, is even more appropriate. Zipf found that the second-most frequent word occurs about half as many times as the first, the third one occurs 1/3 as many times, and the fourth 1/4 as many times as the first: they are inversely proportional.
Implications for Search
Every chart of top search terms I've seen has this kind of curve. Where people know what they're looking for, such as bookstores and DVD rentals, the middle of the curve may be shallower than in informational web sites. But it's always notably steep. Often the top searches are incongruous, seemingly trivial, such as "holidays" in an intranet and "whois" on a domain registration site. They may be navigational, such as "fedex" where the fedex form link is buried in a miscellaneous category or "students" for student homepages: both of these examples were fixed by changing the navigation links on the site.The Long Tail itself consists of longer and often more open-ended research queries: unusual combinations of terms, personal, product and place names, complex concepts, and misspelled / mis-typed words.
As Rich Wiggins pointed out in the Accidental Thesaurus, knowing that the Short Head terms are so very frequent allows search administrators to focus efforts to improve the search on the most popular search terms, rather than guessing what might be most important.
Using the Short Head
To make best use of search terms sorted by popularity, analyze them and improve them iteratively:
- Perform the searches for the top queries and analyze the results to see whether they look reasonably useful, and repeating the test weekly or monthly. I recommend saving the results for the top ten queries, at least, in a dated folder. This allows comparison of new and old results.
- Insert Best Bets search suggestions to direct people to the most likely relevant locations that are not in the top ten search results.
- Implement automated synonym expansion for the most common acronyms.
- Encourage content creators and publishers to improve their document titles.
- Consider changing database or other structured data indexing to include a more descriptive title, but be sure these are unique.
- Work on search usability to present the results displaying likely-valuable information, without overwhelming the page.
In addition, the Short Head phenomenon shows that search engines can cache frequent queries without doing a disservice to users, unless the search is near-real-time, such as auctions or stock.
Analysis of the middle and the Long Tail is more complex. One approach is to group phrases by common words (whole or stemmed), another is to use clustering to find search topics. Humans can often recognize types of searches, such as users searching by product number or Comparing the search terms with a dictionary or the alphabetical inverted index on the site should locate misspelled and mis-typed words, which could be caught with a spellchecker as part of the search query parsing, Best Bets or metadata.
Searches which have no matches in the index are a significant problem, a form of search failure. The steps above can be used productively with the most frequent no-match search terms, although the actual traffic is (usually) much less. For more information, please see the Search Tools Guide pages, Why Searches Fail and Search No-Match Pages.
If you have questions on this or other topics, or would like to discuss search consulting, please comment on the SearchTools Blog or use the contact form, and we will get back to you as soon as possible.
Page Created: 2008-06-26
Search Tools Consulting's principal analyst, Avi Rappoport, may be available to help you with selection, analysis, user experience, and functional search engine work. Please contact us with your questions, comments, or possible consulting discussions.
SearchTools.com - Copyright © 2008-2009 Search Tools Consulting.
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.