Skip to Content Java Solaris Communities Partners My Sun Sun Store United States Worldwide

»  Speech and Voice
»  VLSI Research
»  Barcelona
»  Golden Gate
»  JFluid
Conceptual Indexing
»  Vanguard Media Appliance Platform
»  Next Generation Crypto
»  SunFlight
Improving your ability to find information online

Conceptual Indexing
for Precision Content Retrieval


Can't find what you want?

How often have you failed to find what you wanted in an online search because the words you used failed to match words in the material that you needed? Concept-based retrieval systems attempt to reach beyond the standard keyword approach of simply counting the words from your request that occur in a document. The Conceptual Indexing Project is developing techniques that use knowledge of concepts and their interrelationships to find correspondences between the concepts in your request and those that occur in text passages. Our goal is to improve the convenience and effectiveness of online information access.

The Paraphrase Problem

The central focus of this project is the "paraphrase problem," in which the words used in a query are different from, but conceptually related to, those in material that you need. For example, in a collection of articles by James Fallows, then Washington editor of the Atlantic Monthly, the query, "change in the deficit," results in several relevant passages including "Last year's reductions in tax rates are part of the reason for the deficits, as are the administration's plans for a sustained military buildup." Based on this passage a user can then decide whether to read the rest of the article. While the query and the passage convey similar ideas, the wording in each is different, a typical case of the paraphrase problem.

In addressing the paraphrase problem, three challenges must be met:

  • What information is required to connect the terms in a query to those in a relevant passage?

  • How can this information be organized and used efficiently?

  • To what extent can descriptions of the content of a document be automatically extracted from the document itself?
Our approach to the paraphase problem is to identify and extract concepts (meaningful words and phrases), and relate similar concepts to one another. This information is organized in a structured conceptual taxonomy. Presented with a query, the technology searches through the taxonomy for similar, but not necessarily identical ideas. With special retrieval algorithms, the taxonomy can be scanned efficiently.

Elements of the Technology

The technology, which is called "Precison Content Retrieval," is composed of two parts:
  • Conceptual Indexing
    Builds a structured conceptual taxonomy of words and phrases extracted from the indexed material

  • Specific Passage Retrieval
    Finds specific passages and ranks them according to relevance to the query


Introduction Key Ideas Examples
Benefits Papers People

For more information about the Conceptual Indexing Project, contact indexing-info@east.sun.com.

Sun Microsystems Laboratories (Burlington, MA)

Knowledge Technology Group, Sun Microsystems Laboratories

Would you recommend this Sun site to a friend or colleague?
Contact About Sun News Employment Privacy Terms of Use Trademarks Copyright 1994-2008 Sun Microsystems, Inc.