|
Freedom of Speech:
| |||||||||||||||||||||||
|
For More Information: Additional speech-related projects at Sun Labs: FreeTTS: An open source speech synthesizer written entirely in the Java programming language. The synthesizer is available here. To learn about how FreeTTS performs, read the FreeTTS - A Performance Case Study technical report. Java Speech API: The Speech Integration Group led the development of the Java Speech API in cooperation with Sun's Java Software division and external partners. The Speech Integration Group developed and published the following specifications and documents: |
April 12, 2004 - It's no secret that speech recognition technology is a problem not yet solved. While the promise of speech recognition technology seems unlimited, actual products have plenty of limitations. Anyone who has ever used a "talk-to-type" program or a voice-automated telephone reservation system knows what one of the problems is: it simply doesn't understand everything you say. Even if the system's accuracy rate is 95%--which is pretty good--that still means one out of every 20 words will be misinterpreted. And for most people, it only takes one error to send the frustration level through the roof.
Speech recognition technology still holds considerable promise for businesses, for individuals with disabilities, and for many areas of research -- if it can be made to work effectively. Yet speech recognition research faces one obstacle that is far more nettlesome than improving accuracy rates. It's a problem that must be addressed before new ideas can be transformed into breakthroughs. And it's a challenge that Sun Microsystems Laboratories is uniquely qualified to help solve.
Researchers at Sun Microsystems Laboratories have observed that speech recognition research is becoming increasingly proprietary -- effectively commandeered by companies that are developing speech recognition products.
"Some of the companies doing speech research today are attempting to control the technology like it's a black box," said Willie Walker, the project lead of speech technology research at Sun Labs. "I think they're doing the world a disservice by keeping the ideas of some of the world's leading speech experts hidden behind patents and other intellectual property issues. We think speech recognition is very fertile ground for ongoing research and we want to do everything we can to facilitate, not stifle, innovation."
The Sun Labs solution is classic Sun. Over the past few years, Sun Labs and its research partners Carnegie Mellon University, Mitsubishi Electric Research Labs and Hewlett Packard have been quietly building momentum around an Open Source project: an innovative speech recognition system called Sphinx-4. "With the Open Source model, further research and innovation is encouraged and nurtured rather than hidden," said Mr. Walker. "Researchers have free access to the Sphinx-4 design and documentation from the Web (cmusphinx.sourceforge.net), and input and ideas from all interested parties are welcome."
Sphinx-4 is a state-of-the-art continuous speech, speaker independent recognition engine. It enhances and expands the capabilities of previous-generation Sphinx-3 and Sphinx-2 speech recognition systems, adding flexibility, modularity, and a framework for universal acceptance of various grammars and language and acoustical models. But it is far more than an incremental improvement. From an industry perspective, it represents a paradigm shift on a number of fronts:
As Sphinx-4 continues to grow and develop, it is attracting the attention and enthusiasm of a wide range of people. Researchers at major universities worldwide are adopting the platform for ongoing studies. For example, Dr. Rita Singh at Carnegie Mellon University has joined forces with industry luminaries-including Dr. Jim Baker, who brought Hidden Markov Models to speech recognition, and Dr. Raj Reddy of Carnegie Mellon-to create a Web-based course featuring Sphinx-4 to help spur further innovation in speech technology.
"I think we've created a fabulous system," said Mr. Walker. "We can't be sure we'll succeed in changing the world, but we have created an architecture that holds considerable promise for future research. I believe Sphinx-4 will help facilitate the generation of new ideas and new applications for speech recognition technology. It helps prove the viability of the Java platform for speech recognition. And it gives us a technological foundation--a black box that we can get inside--for further research at Sun.