As of January, 2012, this site is no longer being updated, due to work and health issues

Guide to Search Tools

Show Matching Search Terms in Context

 

Concordances, KWIC and KWOC

The idea of showing the words with their surrounding words comes from hand-created concordances, leading to the Information Science concept of KWIC (Key Word In Context), permuting all content in a text so that each word appears in a central column, alphabetically. This was developed in the late 1950s by Hans Peter Luhn of IBM and used for projects including automating concordances of Shakespeare, chemical listings, and library catalogs.

Here is an example, showing lines from the English and Scottish ballads collected by Francis James Child):

lime (14)
79[C.10] 4 /Which was builded oflime and sand;/Until they came to
247A.6 4 /That was well biggit withlime and stane.
303A.1 2 bower,/Well built wilime and stane,/And Willie came
247A.9 2 /That was well biggit wilime and stane,/Nor has he stoln
305A.2 1 a castell biggit withlime and stane,/O gin it stands not
305A.71 2 is my awin,/I biggit it wilime and stane;/The Tinnies and
79[C.10] 6 /Which was builded withlime and stone.
305A.30 1 a prittie castell oflime and stone,/O gif it stands not
108.15 2 /Which was made both oflime and stone,/Shee tooke him by
175A.33 2 castle then,/Was made oflime and stone;/The vttermost
178[H.2] 2 near by,/Well built withlime and stone;/There is a lady
178F.18 2 built with stone andlime!/But far mair pittie on Lady
178G.35 2 was biggit wi stane andlime!/But far mair pity o Lady
2D.16 1 big a cart o stane andlime,/Gar Robin Redbreast trail it

There were also KWOC (key word out of context) systems, which puts the key word at the start, and were found useful for automatically creating left-aligned alphabetical listings.

Showing the word in context in search results

From its beginning, Google has displayed results items with search terms in context, which they call "snippets", but they have never given much credit to KWIC or its inventor, Earlier web search engines, such as AltaVista and HotBot, had used the contents of the HTML page "description" meta tag, auto-summarize the text, and/or extracted content from the beginning of a page, attempting to describe the page as a whole. Google went a different direction, with what they called a "sneak preview" of the found documents, bolding all the matched search terms, as far back as 2001:

example of google snippets

There's even a video from 2007, describing the snippet extraction process.

 

Other major web search engines, such as Yahoo and Microsoft Live search, and many enterprise search engines use the match term in context as well -- it's not just a Google thing. Any snippet of text taken from the page is also much less likely to include search-spam (words designed to be found and ranked in results, but not terribly meaningful to page readers). So nearly all webwide search engines use match words in context, in simple self-defense.

Each search engine has their own algorithm to define the best text to display. Factors in the decision about which context phrases to show may include:

Benefits of showing match terms in context

Displaying the search terms in the source text context is a useful way to leverage human/search engine interaction. It makes the result items much more transparent: showing why the search engine matched those terms within that document. This process accords with the concepts in Information Foraging Theory, which describes the psychological processes of making choices when faced with large chunks of information.

Page Created: 2008-10-21

Home Guide Tools Listing News Background Search About Us
Creative Commons LicenseSearchTools.com - Copyright © 2001-2009 Search Tools Consulting.
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.