|
|
Fiscal 1994 Project Portfolio Report
William Woods, Principal Investigator
william.woods@East.Sun.COM
Overall Objective
To develop and exploit technology for dealing with knowledge--acquiring it,
organizing it, disseminating it, retrieving it, and browsing it. To develop a body of technology that will address the needs of people whose jobs require efficient access to on-line information. Our efforts toward these goals are presently concentrated on the problems of conceptual information access--that is, access to information based on a conceptual match between a stated information need and material that may meet that need.
Conceptual Indexing
Objective for FY94
To develop techniques for indexing and organizing information in structured
taxonomic representations that will facilitate browsing and retrieval of specific
information in response to specific information needs.
Description
Conceptual Indexing refers to a technology for organizing facts, ideas, words,
phrases, and descriptions into a structured taxonomy that can be used as an
organizing structure for information retrieval and as a structure to support
human browsing. We have implemented a conceptual indexer that will extract
words and phrases from text files and organize them into a taxonomy that can
be browsed and used to access information. Such a taxonomy can index text
material at the level of individual sentences and phrases to support locating
specific answers to specific questions.
The conceptual indexer is being used as a component for an experimental
intelligent querying system that we have expanded and experimented with
over the past year. We have implemented several versions of this intelligent
querying system that differ in the amount of analysis done at indexing time as
opposed to retrieval time, and we are still experimenting with
this system to understand its behavior and to develop techniques to extend its
capabilities.
The components of the conceptual indexer include:
- A parser that analyzes phrases extracted from text to determine their
conceptual structure for incorporation into the index
- A core dictionary of words that is used by the parser to determine the
structure of phrases
- A morphological analysis component for analyzing unknown words that
may be inflected or derived forms of words that are in the dictionary and for
guessing grammatical roles of unknown words
- A knowledge base of semantic relationships among words and concepts that
is used to judge the relationships among complex concepts
- A conceptual classifier that takes conceptual descriptions and assimilates
them into a taxonomy in such a way that they are directly linked to the most
specific concepts that subsume them and to the most general concepts that
they in turn subsume
- A browser for viewing and navigating within a conceptual taxonomy
Accomplishments
We have implemented a second generation of the intelligent retrieval system
algorithm and begun conducting experiments using this system. We have
achieved a significant improvement in success rate over traditional retrieval
systems for finding answers to short, specific questions.
In addition, we have constructed a persistent version of the conceptual
classifier, substantially extended our lexicon and morphological rules, and
made substantial headway on a grammar for concept extraction.
Recently, we have been able to construct conceptual indexes for the complete
UNIX man pages collection and for the SunExpress®on-line catalog database.
Experiments in tuning and evaluation of both of these efforts are in progress.
We acquired the latest version of Wordnet from Princeton University and
integrated it with an upper-level ontology from the Information Sciences
Institute, University of Southern California.
We are using Brown University's Text Corpus and Princeton University's
Wordnet to develop technology for automatically choosing senses for
ambiguous words as a function of their context.
We have interacted with a number of Sun internal groups regarding potential
applications of our technology and Sun's directions in information technology.
References
Publications
"Beyond Ignorance-Based Systems," W. A. Woods, International Conference on Principles of Knowledge Representation and Reasoning, Bonn Germany, May 24-27, 1994, SMLI 94-0193.
"How Do Symbols and Networks Fit Together: A Report from the AAAI Workshop on Integrating Neural and Symbolic Processes," R. Sun, L.A Bookman, AI Magazine, Volume 14 Number 2, Summer 1993, SMLI 92-0278.
|