Skip to Content Java Solaris Communities Partners My Sun Sun Store United States Worldwide

»  Spotlight Articles
»  Projects
»  Publications
»  People
»  Awards
»  Events
»  Downloads
»  Internships
»  Contrarian Minds
»  About Sun Labs
Freedom of Speech:

Freedom of Speech:

Open Source Speech Recognition Initiative Led By Sun Labs Opens Up New Possibilities for Speech Technology


   Will Walker

For More Information:

Email Will Walker

research.sun.com

More on Sphinx

Additional speech-related projects at Sun Labs:

FreeTTS:

An open source speech synthesizer written entirely in the Java programming language. The synthesizer is available here. To learn about how FreeTTS performs, read the FreeTTS - A Performance Case Study technical report.

Java Speech API:

The Speech Integration Group led the development of the Java Speech API in cooperation with Sun's Java Software division and external partners. The Speech Integration Group developed and published the following specifications and documents:

April 12, 2004 - It's no secret that speech recognition technology is a problem not yet solved. While the promise of speech recognition technology seems unlimited, actual products have plenty of limitations. Anyone who has ever used a "talk-to-type" program or a voice-automated telephone reservation system knows what one of the problems is: it simply doesn't understand everything you say. Even if the system's accuracy rate is 95%--which is pretty good--that still means one out of every 20 words will be misinterpreted. And for most people, it only takes one error to send the frustration level through the roof.

Speech recognition technology still holds considerable promise for businesses, for individuals with disabilities, and for many areas of research -- if it can be made to work effectively. Yet speech recognition research faces one obstacle that is far more nettlesome than improving accuracy rates. It's a problem that must be addressed before new ideas can be transformed into breakthroughs. And it's a challenge that Sun Microsystems Laboratories is uniquely qualified to help solve.

Clearing the Path to Fertile Ground

Researchers at Sun Microsystems Laboratories have observed that speech recognition research is becoming increasingly proprietary -- effectively commandeered by companies that are developing speech recognition products.

"Some of the companies doing speech research today are attempting to control the technology like it's a black box," said Willie Walker, the project lead of speech technology research at Sun Labs. "I think they're doing the world a disservice by keeping the ideas of some of the world's leading speech experts hidden behind patents and other intellectual property issues. We think speech recognition is very fertile ground for ongoing research and we want to do everything we can to facilitate, not stifle, innovation."

The Sun Labs solution is classic Sun. Over the past few years, Sun Labs and its research partners Carnegie Mellon University, Mitsubishi Electric Research Labs and Hewlett Packard have been quietly building momentum around an Open Source project: an innovative speech recognition system called Sphinx-4. "With the Open Source model, further research and innovation is encouraged and nurtured rather than hidden," said Mr. Walker. "Researchers have free access to the Sphinx-4 design and documentation from the Web (cmusphinx.sourceforge.net), and input and ideas from all interested parties are welcome."

Something to Talk About

Sphinx-4 is a state-of-the-art continuous speech, speaker independent recognition engine. It enhances and expands the capabilities of previous-generation Sphinx-3 and Sphinx-2 speech recognition systems, adding flexibility, modularity, and a framework for universal acceptance of various grammars and language and acoustical models. But it is far more than an incremental improvement. From an industry perspective, it represents a paradigm shift on a number of fronts:

  • Sphinx-4 is developed entirely in the Java Programming Language, making it easily portable to multiple operating systems and environments--any system that supports the Java platform can run it. "For those concerned about performance issues of the Java platform," Walker says, "our experience with FreeTTS, our open source speech synthesis engine, shows that Java is definitely up to the task of running the compute- and memory-intensive speech synthesis engine at high performance."
  • As an open source project, Sphinx-4 allows researchers to "get under the hood" to do things they cannot do with today's proprietary systems. Furthermore, Sphinx-4 is available via a BSD-style license, which is a far more generous license than the typical open source licenses: GPL and LGPL.
  • "The modular, extensible, and pluggable architecture of Sphinx-4 allows researchers to divide the recognizer into smaller components and do research on them individually, taking one component out and putting another in its place," said Mr. Walker. In addition, Sphinx-4 provides implementations of a number of common modules that fit into this framework. As a result, in order to innovate in one small area, a researcher need not develop an entire system from the ground up.
  • The cross organization team is composed of experts who bring a variety of complimentary skills to the table. "For example, Dr. Bhiksha Raj of Mitsubishi Electric Research Labs, our foremost speech expert, was able to drive the design from a speech theory and practices point of view," said Mr. Walker, "whereas my Sun Labs team was able to provide world-class software engineering and development expertise."

Momentum Builds

As Sphinx-4 continues to grow and develop, it is attracting the attention and enthusiasm of a wide range of people. Researchers at major universities worldwide are adopting the platform for ongoing studies. For example, Dr. Rita Singh at Carnegie Mellon University has joined forces with industry luminaries-including Dr. Jim Baker, who brought Hidden Markov Models to speech recognition, and Dr. Raj Reddy of Carnegie Mellon-to create a Web-based course featuring Sphinx-4 to help spur further innovation in speech technology.

"I think we've created a fabulous system," said Mr. Walker. "We can't be sure we'll succeed in changing the world, but we have created an architecture that holds considerable promise for future research. I believe Sphinx-4 will help facilitate the generation of new ideas and new applications for speech recognition technology. It helps prove the viability of the Java platform for speech recognition. And it gives us a technological foundation--a black box that we can get inside--for further research at Sun.

Would you recommend this Sun site to a friend or colleague?
Contact About Sun News Employment Privacy Terms of Use Trademarks Copyright 1994-2008 Sun Microsystems, Inc.