As of January, 2012, this site is no longer being updated, due to work and health issues

Search Tools Listings

Tools for Taxonomies, Browsable Directories,
and Classifying Documents into Categories



For Definitions, Articles and Resources, see the Taxonomies and Classifiers page which discusses the entire concept of automated classification, categories, taxonomies, clustering, hierarchies, and browsable listings. See also the Visualization Tools report.

Classification Tools & Services

Applied Semantics Auto-Categorizer
Combines automated categorization with editorial tools for human judgment in building taxonomies, with no training documents or rules required. Categorization works in real time. Processing based on a linguistically created ontology combining millions of words, meanings and conceptual relationships. Automatically categorizes given content and then allows administrators to create unique taxonomies with a Windows client console, mapping categories and subcategories to the main ontology. Includes a test tool to view content in ontology. Works across many languages, scales quickly, returns results immediately, integrates using XML, APIs for C, Java, Perl and Visual Basic.
Report: Unstructured Data Management: the elephant in the corner (guest or customer access required) the451 Report, November 2002 by Nick Patience and Rachel Chalmers
Discusses the business aspects of the company, pricing, markets (domain names, paid placement for search terms, and publishing), competition and the technology. Finds the company profitable, successful in its niches, but not clear about scaling.
Report: Searching for Value in Search Technology (subscribers only) Gilbane Report Vol 10, Num 7, September 2002 by Sebastian Holt
Highlights solution providers, praises Applied Semantics for starting with their proprietary ontology and applying technology to solve real problems in the target markets.
Article Three Paths to Sorting Content: AutoCategorizer 1.1: eWeek, July 15, 2002 by Jim Rapoza
Description of the approach to categorization, praises ease of control, support for publishing industry. Approx. $150,000.
Article: Finding the right piece of information, Structure and meaning improve search results KMworld Magazine, September 2001 by Judith Lamont
Describes several intranet and commerce search solutions which attempt to use structure and language to improve search results. Certified General Accountants of Ontario, a professional association for accountants, used Hummingbird's KnowledgeServer to find content, reducing call center inquiries. Coldwater Creek used EasyAsk for its online store, to improve responsiveness to customers. Applied Semantics categorizer and summarizer can augment search engine results.
 
AskJeeves JeevesOne (link to SearchTools Listing Report)
Categories created to answer questions rather than provide a simple listing of page matches.
Examples: AskJeeves, Dell Online Support: Ask Dudley

Autonomy Categorizer (see also SearchTools Listing for Autonomy Search)
Creates concept maps and topic clusters without human intervention by using Bayesian probability and pattern-recognition. Users have complained that the system can be slow and the automation difficult to understand.  
Report: Taxonomy & Content Classification: Delphi Group Report (guest or customer access required) Delphi, April 11, 2002
Vendor-submitted information about the automatic cross-lingual clustering and classification, tagging with metadata, generating hierarchical trees..
 
Bravo engine from Global Wisdom
Gathers feedback from end-users to update taxonomies and create topic maps. 
 
BrightPlanet Deep Web Directory
An automatic portal indexer and classifier service, places high priority documents into portal directory structure. Also performs metasearch on local and external sources such as webwide search engines and web searchable databases.
 
Carrot2 new
Carrot2 is a research framework for experimenting with automated querying of various data sources (such as search engines), processing search results and their visualization. Unfortunately, the system does not work with Mac OS X browsers. Thanks to Gary Price of Resource Shelf for the link.
 
Convera RetrievalWare (link to SearchTools Listing Report)
Categorization using semantic models, handles multimedia content.
Article: Taxonomy & Content Classification: Delphi Group Report (guest or customer access required) Delphi, April 11, 2002
Vendor-submitted information about categorization using synchronized taxonomies, semantic networks in many languages.
 
DarWin Set (link to SearchTools Listing Report)
Dynamic Categorization with categories created on the fly. There are no predefined or established categories.

EasyAsk (link to SearchTools Listing Report)
Works with database structures to generate categories, and puts items from multiple search engines into topical categories.

80-20 Discovery (link to SearchTools Listing Report)
Uses neural net algorithms to create categories on the fly.
 
Entrieva (formerly Semio)
Data mining software uses linguistic analysis and rules to extract concepts from textual information and displays the concepts and relationships in a 3d map. Creates taxonomies based on fitting content into existing categories in the fields of defense, drugs, health care and technology.
Report: Unstructured Data Management: the elephant in the corner (guest or customer access required) the451 Report, November 2002 by Nick Patience and Rachel Chalmers
Discusses the business aspects of the company, pricing, markets, competition and the technology. The recent acquisition is a cause for concern, but also an opportunity to create more standard APIs and support Web Services.
Report: Taxonomy & Content Classification: Delphi Group Report (guest or customer access required) Delphi, April 11, 2002
Vendor-submitted information covers linguistic analysis, concept extraction and statistical clustering techniques. Works with lexicons and existing taxonomies. Comes with vertical market taxonomies and thesauri.
Article: Semio Brings Concepts To Web Search System WebWEEK, 1997/02/03 by Jeremy Carl
Describes Semio's attempt to improve text searching by viewing conceptual relationships in a 3-D model.

GammaSite
Uses Machine Learning with a small training set to create taxonomies and add documents to categories. Allows human oversight of structure and easy maintenance.
Article: Get ready for the digital librarian Jerusalem Post; July 1, 2001 by Gwen Ackerman
Describes features of the software, quotes the president on the advantages of machine learning. Mentions a successful installation at the UK Daily Telegraph and winning a test run by the Encyclopedia Britannica.
Article: Content Taxonomy Talk Content Wire; September 6, 2001
Interview with the company officers describes the value of categorization specialization, winning the Britannica test, minimum manual labor for customers and taxonomy flexibility
   
GuideBeam
Reformulates queries and post-processes search results to cluster them by category. Example uses public search engines, but could easily be applied to site and Intranet search. 
 
H5 Technologies Content Categorization
Automatic organization based on "aboutness" using a proprietary algorithm that does not rely on linguistic analysis.
 
Hummingbird Knowledge Server (link to SearchTools Listing Report)
Automated clustering, categorization and visualization tools, along with a search engine, in a knowledge management suite.
Article: Hummingbird Fulcrum KnowledgeServer 3.5 CRN ChannelWEB Test Center April 6, 2001
Summary of test results indicates that automatic clustering "is not an exact science" and requires manual processing. Also describes features of crawling, hierarchical display, distributed searching, custom weighting and search query options.

HyperSeek (InteractiveWeb)
Database-oriented link catalog application includes customizable HTML for categories, control of search results listings, admin tools.
Examples: Custom version at SearchKing

IBM Intelligent Miner for Text (link to SearchTools Listing Report)
Includes linguistic analysis, vector-based automatic clustering and/or classifying documents into predetermined categories.
 
IBM Lotus Discovery Server
Automated tools include statistical analysis, evaluation of relationships among people and documents known as social networks, and clustering of content to create a taxonomy. Can access Notes databases, Domino.doc management system, web sites and intranets, Microsoft Exchange, etc. Creates a Knowledge Map to display categories and hierarchy for browsing, also integrates with Lotus search engines. Allows for human editorial oversight.
Article: Discovery Server Helps Take the Taxing Out of Taxonomies ePro magazine, November 13, 2002 by Jim O'Donnell
Description of Lotus Discovery Server which makes a taxonomy for navigating portals or intranets. Analyst Andrew Warzecha of Meta Group points out that taxonomies are just difficult to create, and that IT and corporate librarians must work together to do a good job. However, companies are looking for information systems and there is increasing awareness of the value of taxonomies.
Report: Unstructured Data Management: the elephant in the corner (guest or customer access required) the451 Report, November 2002 by Nick Patience and Rachel Chalmers
Discusses the business aspects of the product, sales channels, markets (mainly manufacturing and financial services), competition and the technology. Integrated tightly with Notes and Exchange, does not have any pre-built taxonomies, has a low profile.
 
IBM Text Analyzer Business Component
Performs high-volume text document categorization quickly, based on training, rules and natural-languge parsing.
 
ic-classify
Linguistic analysis for sorting documents into categories based on content similarity to training set. Uses natural-language processing, lexical knowledge and semantic categories to create information hierarchies and taxonomies.
 
Inxight Categorizer (link to SearchTools Listing Report)
Using linguistic and statistical technology from parent company Xerox PARC, Inxight's categorizer automatically classifies content and organizes by subject. Identifies entities such as people, places, companies and products. Can integrate with personalization tools to build individual categories. Can display results in a Star Tree visualization or more traditional text list. Taxonomy manager application provides interactive control for editors, training sets and specific rules also apply.
Report: Unstructured Data Management: the elephant in the corner (guest or customer access required) the451 Report, November 2002 by Nick Patience and Rachel Chalmers
Discusses the business aspects of the product, sales channels, markets (mainly enterprise, publishers, government, OEMs), competition and the technology. Considers this to be the best of the categorization and visualization tools.
 
Klarity
Analyzes large collections of documents and generates metadata conceptual terms based on seed documents about a topic.

LexiQuest Mine (see also LexiQuest SearchTools Report)
Uses linguistic analysis tools to categorize and classify unstructured data.
 
LexTek Profiler Engine and RouteX (see also LexTek Onix SearchTools Report)
Toolkits for automatic classification of document stores and routing of incoming documents, newsfeeds, email. C++ on many platforms.

Links (Gossamer Threads, Inc.)
A set of Perl scripts with an excellent browser administration interface with automated submissions and approvals. Free for personal use.
 
MAI (Machine-Aided Indexer)
Assists catalogers and classification experts by extracting concepts from documents and suggesting appropriate index terms.
 
MetaTagger for Interwoven
Generates a taxonomy or uses an existing one, categorizes documents into one or more taxonomies, extracts summaries, keywords and custom data such as dates and currency. Integrates with multimedia file format. Provides an interactive interface for editorial control.
Article Three Paths to Sorting Content: MetaTagger 3.0: eWeek, July 15, 2002 by Jim Rapoza
Detailed description of the approach to categorization, including advantages and disadvantages of tight integration with TeamSite. Tests show effective categorization during publishing workflow, easy corrections, generation of categories on the fly. Cost is $85,000 to $110,000 per server in addition to TeamSite.
 
Mohomine MohoClassifier
Uses pattern recognition and statistical algorithms as support vector machines and feature selection to distinguish among categories, APIs for integration with other software, and scales to millions of documents. Claims to need very small example sets
Report: Unstructured Data Management: the elephant in the corner (guest or customer access required) the451 Report, November 2002 by Nick Patience and Rachel Chalmers
Discusses the business aspects of the product, sales channels, pricing, markets (mainly resume processing for HR and various defense uses), competition and the technology. Foresees an acquisition of this company in the future.
Article: Taxonomy & Content Classification: Delphi Group Report (guest or customer access required) Delphi, April 11, 2002
Vendor-submitted information describes neural network approach based on pattern recognition and machine learning. Populates customer-defined taxonomies, small example sets, multiple categories per document.
Article: Knowledge Management: Mohomine FinancialWeb; January 25, 2001 by Kristen Rosen.
Interview with the CEO, Neil Centuria, about the source and background of the company: it was started as a search engine and classification tool to support the Source Bank code snippet archive site and extended to other structured data. It has special features to support multiple terms for the same meaning (such as B and BK for black), populate vertical portals with directed crawling and fast updating. Claims the classification software can eliminate the editors needed to create a directory.
Article: Create User Loyalty by Improving Search Capability ClickZ Today October 18, 2000 by Paul Bruemmer
Inspired by the Forrester search report, this article includes marketing information from several search engine developers, including AltaVista, Mohomine, and Twirlix.
Article: In Search Of... If you want good search on your site, commit to doing it right unless you want to alienate visitors Industry Standards, October 9, 2000 by G. Patrick Pawling
Describes problems with site search, such as searches which expand names too far (Seger to Segarra for example) and summarizes the Forrester Report on Search. Reports that one company put "buy" buttons on search results pages and found 30% of its orders came from there. Mentions Mercado options to adjust search results for e-commerce purposes. Describes Mohomine automated summarization and categorization tools. Quotes Jupiter Communications analyst Lydia Loizides as estimating the cost at between $50,000 and $2 million. Describes alternate approaches, such as a conversational or interview-driven search, and choosing an area, such as multimedia, to reduce the number of inappropriate matches.
 
MondoSearch (link to SearchTools Listing Report)
Creates classes based on the server file path, which can be adjusted and renamed by administrators.
 
Muscat Structure (link to SearchTools Listing Report)
Automatic realtime categorisation using a rulebuilder tool to specify documents which do or do not fall into any specific category in a taxonomy. Rules can be built automatically, manually or both, can be general or specific. See also Muscat Discovery search engine page.
 
Netscape Compass Server now part of iPlanet Portal (link to SearchTools Listing Report)
Automatically creates a customized category tree to show the hierarchical organization of the data. Users have complained that the categorization can be erratic.
  
Endymion OpenBridge (Formerly ZNOW, see also SearchTools Listing for OpenBridge)
Automated classification puts pages into topics based on common words among a set of search results. Hierarchical clustering lets users chose more and more specific topics. Linguistic problems sometimes appear, such as using the term "booking" instead of "books".
 
PortalAuthor
Java-based organizational tool for Intranets and corporate portals.
 
 
 
Readware
Sophisticated classification system with a ConceptBase filled with fundamental structures of knowledge, then correlated to queries and documents.
 
Recommind Categorizer
Using an existing classification or taxonomy, this software can automatically classify documents based on semantics and probabilistic analysis.
Customer response to MindServer Recommind Press Release, June 25 2002
Quotes the head of IT Development at ZDF, Europe's largest television station saying that MindServer expands capacity, improves quality and accessibility. Other materials quote a program manager at the Department of Energy’s Office of Science and Technical Investigation (OSTI), saying that the categorization tool "significantly outperformed our human experts in terms of accuracy and consistency."

Roads Project Software
Free academic classification system which also connects distributed cooperative databases.
Example: Biz/Ed - Business Education online in the UK.
 
Sageware
Allows catalogers to set up sophisticated topic definitions, inserts documents matching the rules automatically.
Article: Sageware: Creating the Categories for Information Retrieval Patricia Seybold Group Snapshot, December 1997 by Geoffrey Bock.
 
Saqqara ContentWorks
Includes automatic classification and a taxonomy manager, designed mainly for b2b databases of products.
Article: Product Content Management From E-Catalogs Supplier Content Wire, December 10 2001
 
Stratify (formerly PurpleYogi)
Unstructured data management system includes a Classification server. Can crawl or integrate with a robot crawler from a search engine. Taxonomies can be imported, created from the top down, clustered from the bottom up or interactively designed. Uses rules, statistics and pattern matching to classify documents into taxonomy categories, subject to editorial control. Can categorize search engine results. Had partnered with Inktomi before that company acquired Quiver.
 
Report: Unstructured Data Management: the elephant in the corner (guest or customer access required) the451 Report, November 2002 by Nick Patience and Rachel Chalmers
Discusses the business aspects of the product, sales channels, pricing, markets (mainly manufacturing, government, financial services, etc.), competition and the technology.
 
Super Site Server (link to SearchTools Listing Report)
Search engine and classification tool combined
Example: Thunderseek.com - a selective portal for the 'best of the best'
Example: E Fetch - animal-oriented directory and portal
 
Thinkmap
A tool for displaying complex information using an animated multidimensional display designed for user interaction using Java.
Example: Plumb Design Visual Thesaurus

Thunderstone Automated Categorization Engine (see also Thunderstone Webinator search engine)
This application works with the Webinator indexer to add pages to categories automatically. Based on training sets for each high-level category, with additional adjustment options available. Uses the Texis SQL database and Vortex scripting language for configuration, browser admin for maintenance. Runs on Unix, Linux and Windows.
Article Three Paths to Sorting Content: Texis Categorizer 4.1: eWeek, July 15, 2002 by Jim Rapoza
Describes product as providing good categorization, easy to install for many Web developers, flexible, interoperable, and inexpensive ($10,000 for the Texis engine and $10,000 for Categorizer).
Example: Thunderstone Web Site Catalog
 
 
TopicalNet
Provides automated systems to classify pages with a customer taxonomy. Uses the web to fit 40 million pages into 70,000 subtopics within 15,000 categories. Runs on Linux or Windows 2000, can classify 1.5 million pages per day from text, HTML or XML sources, conversions available for MS Office, PDF and other formats.
Press Release: Inmagic Gatherer and Classifier Released June 10, 2002
InMagic announces a partnership with TopicalNet to integrate classification with the content management and search engine.
Article: Taxonomy & Content Classification: Delphi Group Report (guest or customer access required) Delphi, April 11, 2002
Vendor-submitted information describes the value of the pre-built taxonomy and integration with existing taxonomies. Uses both semantic and syntactic knowledge.

Verity Knowledge Organizer
Creates knowledge trees and category hierarchies using rules.
Article: Taxonomy & Content Classification: Delphi Group Report (guest or customer access required) Delphi, April 11, 2002
Vendor-submitted information creates new taxonomies or works with existing systems, can create an automatic or hybrid model, populate it with documents.
Article: Verity add-on makes portals easier to build, navigate Infoworld, March 22 1999 by Emily Fitzloff
Product announcement for Verity Knowledge Organizer, which lets catalogers classify data and add pages to categories automatically. 

Verity Ultraseek Advanced Classifier (formerly Quiver)
Provides a taxonomy management system, designed to integrate automatic classification with editorial control and workflow
Article: Taxonomy & Content Classification: Delphi Group Report (guest or customer access required) Delphi, April 11, 2002
Vendor-submitted information describes hybrid automatic and human classification, based on the Naïve Basin algorithms, machine cleaning and tokenization.
 
Verity Ultraseek CCE (Content Classification Engine) (see also Ultraseek Search Report)
Extends the Ultraseek (previously Inktomi Search Software) engine by allowing administrators to import a site map and specify toplevel and subcategories, then create rules using wildcards and regular expression searching to match pages to those categories.
Example: Northstar - State of Minnesota Government Information
 
Vivisimo
Post-processing clustering organizes search results from another engine into folders by topic. This is very dependent on the quality of search results, and can miss topics entirely or group them into generic categories such as "products".
Article: Vivisimo Meta Search Engine FreePint, September 29, 2000 by Simon Collery
Web search expert evaluates the clustering results, give qualified approval to the process.

WordMap
Enterprise taxonomy management program, available as software or remote service (ASP). Interactive tool presents the user with a list of all possible meanings, then reformulates the query to improve both recall and precision. Options allow the user to send queries to multiple search engines. Company also develops subject taxonomies and offers consulting.
Example: WordMap web search
This demo shows how the query disambiguation interface works. Try "bank" or "lotus".
Article: Wordmap Launches New Taxonomy-Building Service Information Today, January 2002
Describes taxonomy services, including multilingual versions, links among subject areas and metasearching.
Announcement: Wordmap-Software used by DaimlerChrysler for enterprise taxonomy February 6, 2002
Describes the plans for using Wordmap within the automobile company's content architecture. 
 
XFML - eXchangable Faceted Metadata Language
An XML format for interchange of faceted metadata, mainly within hierarchical taxonomies. It allows people to tag a set of topics and associated URLs so that other applications supporting this format can recognize the relationships.

See also:

CGI Resources Site - Link Indexing Scripts

Search Tools:


Page Updated 2003-07-07