As of January, 2012, this site is no longer being updated, due to work and health issues
Evaluating relevance in results and ranking is one of the hardest tasks in Information Retrieval. The basic problem is that it's easiest to search on words, but words often fail to express the "aboutness" of a search query or document. There are a number of approaches that add "weight" to results rankings: they use the data and the document structure as much as possible to decide which of the pages, which are otherwise equivalent, should be sorted higher.
This section is testing these algorithms. For more information on the test suite, see the Search Indexing Test List.
For our tests, we have a set of pages which contain some miscellaneous text, and some unique terms which are not words and therefore will not bring up any false matches (unless the search engine is performing phonetic or soundex seaching). The special terms are:
- ztycl
- dlrowcam
- hcraessloot
To test, search using both AND and OR searches to see which pages are considered most relevant to these terms.
Text Weighting
- One term match
- Two terms match:
- Three terms match
- Three matches, adjusted term frequency
- ztycl and dlrowcam once, hcraessloot 3 times
- ztycl and hcraessloot once, dlrowcam 3 times
- hcraessloot and dlrowcam once, ztycl 3 times
- ztycl once, hcraessloot and dlrowcam 3 times
- dlrowcam once, ztycl and hcraessloot 3 times
- hcraessloot once, ztycl and dlrowcam 3 times
- Terms as phrases
- Page length, with one instance each of all three terms
- long page (4,000 characters)
- medium page (2,000 characters)
- short page (1,000 characters)
Page Structure Weighting
- Words in Title tag
- Words in META Keywords tag
- Words in Meta Description tag
- Words in H1 Tag
- Words in H2 Tag
- Words in H3 Tag
- Words at the top of the page
- Words at the bottom of the page