Cloudgarden Discussion Board
This topic has 2 replies.
[ << Previous Topic | Start of Thread | >> Next Topic ]
Algorithms and structures in JSAPI implementation
Author: Kamil Rogowski |
05/05/10 13:55 |
Reply |
Hi.
I can't get any information about algorithms used in this implementation. Maybe there are some resources in the Internet? Maybe you know contact to
person who share this knowledge with me? Or maybe it's secret? :)
I'm especially interested in the way of classification and pattern
matching. Its uses hidden Markov models or maybe neural networks? I
assume its support some statistical data about language, maybe from
any well known language corpus? What about recognition process - its
uses phonemes? What types of parameters of speech wave are extracted?
Maybe cepstral?
Every information would be very helpful for me. I need this
information in my theoretical part of thesis. |
|
Re: Algorithms and structures in JSAPI implementation Author:Roland | Reply 1 of 2 | 05/11/10 07:15 | Reply | Hi Kamil
CloudGarden has implemented the JSAPI interface (see Sun's specs) using SAPI as bridge to another implementation. SAPI is the Microsoft Interface for speech recognition. Most common implementation for SAPI is the speech engine provided by by Microsoft self. But other companies like Dragon Natural Speaking (DNS) also can provide with a SAPI compatible engine. (although DNS only supports SAPI version 4 not the latest version 5)
BTW since you are interested in pattern matching, did you have a look at activities of Ray Kurzweil (en.wikipedia.org/wiki/Ray_Kurzweil)? |
|
Re: Algorithms and structures in JSAPI implementation Author:Kamil Rogowski | Reply 2 of 2 | 05/16/10 09:41 | Reply | Thank you for the answer.
OK. I'm using SAPI5. Tell me if I'm correct: the whole implementation of speech recognition process is in speech engines, not in JSAPI implementation? So e.g. algorithms for determining the endpoint of isolated utterances, phonemes extraction or classification, etc... are provided by definite vendor (Microsoft, IBM...)? In this case in this SAPI5?
I reached white papers of Microsoft Speech API 5.3:
"...It describes how engines are registered and initialized; how grammar and lexicon information is communicated to the engine; how engines read data and perform recognition..."
So there should be information about how recognition is performed.
Later we have:
"Grammar designers use a Weight field during each transition to change the likelihood of certain paths being taken. This Weight field is a probability - the range of values is 0.0 to 1.0, and the values of the transitions out of any state sum to 1.0. A value of 0.0 should always be interpreted as making this transition impossible to pass during recognition. Engines may or may not incorporate the other weight values into their recognition search. By default, grammars do not have weights set, so each transition weight will by 1.0 divided by the number of transitions out of the preceding state."
But it could relate to HMM or NN structure, and many others. No details.
Going farther:
"The engine starts reading data and doing recognition."
This "doing recognition" thing is NOT described later. In next few steps they write about recognition results. So i can't find information about recognition details even in specification. I'll send them mail with my questions, but I don't have hope for receiving answer. |
|