SAPI Pronunciation Training App

  • Thread starter Thread starter Oztromboli
  • Start date Start date
O

Oztromboli

Guest
I want to develop an app for evaluating the pronunciation of non-native English speakers on words and short phrases (up to 5 words). The user would receive feedback on the accuracy of the pronunciation, perhaps with further feedback on which words/phonemes are poorly pronounced.

I have reviewed the following content:

Finding pronunciation correctness

Is it possible to use WIndows Speech Recognition Engine in a word pronunciation game?

Speech to Phoneme in .Net

How to get pronunciation phonemes corresponding to a word using C#?

From Eric Brown's comments, a project using System.Speech will need to be in C++, and follow the following steps:https://stackoverflow.com/questions...tion-phonemes-corresponding-to-a-word-using-c

1. Generate pronunciations for the target word/phrase as phonemes using ISpEnginePronunciation::GetPronunciations This will generate one or more different variants as to how the word/phrase might be pronounced;

2. Generate the recognised phonemes from the user using a dictation grammar;

3. Run a comparison between each of the pronunciation variants of the target word/phrase and the recognised phonemes. This is the nub of the issue.

It has been suggested that the Levenshtein distance might be used to make the comparison. This appears to be a string matching algorithm and doesn't account, for instance, for the similarity between the d and t sounds in English.

I did see one comment from Eric Brown that "The SAPI Phonetic Alphabet Reference can help you here, as it breaks down the consonants & vowels into features"

I think I found that here:

Phonetic Alphabet Reference (Microsoft.Speech) - Microsoft Speech Platform SDK 11 Documentation

It does not seem to provide what I'm looking for; a method of assessing the similarity in pronunciation.

My questions:

1. Am I on the right track with the description above?;

2. If so, how to use the SAPI Phonetic Alphabet Reference to calculate the "distance" between a target pronunciation and a spoken one?

3. What would be the limitations of this method? Would it be useful for long phrases of more than a few words?

Thanks in advance,

Ozs

PS Previously posted in the NET Framework Class Libraries forum : System.Speech Pronunciation Training App

Continue reading...
 
Back
Top