Sun and Oracle Community Voices How to Buy Log In United States [Change] English

»  1992
»  1993
»  1994
»  1995
»  1996
»  1997
»  1998
»  1999
»  2000
»  2001
»  2002
»  2003
»  2004
»  2005
»  2006

Fiscal 1994 Project Portfolio Report





Knowledge Technology

William Woods, Principal Investigator
william.woods@East.Sun.COM

Overall Objective

To develop and exploit technology for dealing with knowledge--acquiring it, organizing it, disseminating it, retrieving it, and browsing it. To develop a body of technology that will address the needs of people whose jobs require efficient access to on-line information. Our efforts toward these goals are presently concentrated on the problems of conceptual information access--that is, access to information based on a conceptual match between a stated information need and material that may meet that need.

Conceptual Indexing

Objective for FY94

To develop techniques for indexing and organizing information in structured taxonomic representations that will facilitate browsing and retrieval of specific information in response to specific information needs.

Description

Conceptual Indexing refers to a technology for organizing facts, ideas, words, phrases, and descriptions into a structured taxonomy that can be used as an organizing structure for information retrieval and as a structure to support human browsing. We have implemented a conceptual indexer that will extract words and phrases from text files and organize them into a taxonomy that can be browsed and used to access information. Such a taxonomy can index text material at the level of individual sentences and phrases to support locating specific answers to specific questions.

The conceptual indexer is being used as a component for an experimental intelligent querying system that we have expanded and experimented with over the past year. We have implemented several versions of this intelligent querying system that differ in the amount of analysis done at indexing time as opposed to retrieval time, and we are still experimenting with this system to understand its behavior and to develop techniques to extend its capabilities.

The components of the conceptual indexer include:

  • A parser that analyzes phrases extracted from text to determine their conceptual structure for incorporation into the index
  • A core dictionary of words that is used by the parser to determine the structure of phrases
  • A morphological analysis component for analyzing unknown words that may be inflected or derived forms of words that are in the dictionary and for guessing grammatical roles of unknown words
  • A knowledge base of semantic relationships among words and concepts that is used to judge the relationships among complex concepts
  • A conceptual classifier that takes conceptual descriptions and assimilates them into a taxonomy in such a way that they are directly linked to the most specific concepts that subsume them and to the most general concepts that they in turn subsume
  • A browser for viewing and navigating within a conceptual taxonomy

Accomplishments

We have implemented a second generation of the intelligent retrieval system algorithm and begun conducting experiments using this system. We have achieved a significant improvement in success rate over traditional retrieval systems for finding answers to short, specific questions.

In addition, we have constructed a persistent version of the conceptual classifier, substantially extended our lexicon and morphological rules, and made substantial headway on a grammar for concept extraction.

Recently, we have been able to construct conceptual indexes for the complete UNIX man pages collection and for the SunExpress®on-line catalog database. Experiments in tuning and evaluation of both of these efforts are in progress.

We acquired the latest version of Wordnet from Princeton University and integrated it with an upper-level ontology from the Information Sciences Institute, University of Southern California.

We are using Brown University's Text Corpus and Princeton University's Wordnet to develop technology for automatically choosing senses for ambiguous words as a function of their context.

We have interacted with a number of Sun internal groups regarding potential applications of our technology and Sun's directions in information technology.

References

Publications
"Beyond Ignorance-Based Systems," W. A. Woods, International Conference on Principles of Knowledge Representation and Reasoning, Bonn Germany, May 24-27, 1994, SMLI 94-0193.

"How Do Symbols and Networks Fit Together: A Report from the AAAI Workshop on Integrating Neural and Symbolic Processes," R. Sun, L.A Bookman, AI Magazine, Volume 14 Number 2, Summer 1993, SMLI 92-0278.