Audio Fundamentals

EDN Admin · Jun 16, 2011

This video covers the basics of reading audio data from the Kinect microphone array, a demo adapted from the built in audio recorder. The video also covers speech recognition using Kinect. For the built in example this was based on and the speech demo in C#, check out your "My DocumentsMicrosoft Research KinectSDK SamplesAudio" directory. You can http://files.ch9.ms/coding4fun/KinectSDKSamplesVB.zip download the the Visaul Basic examples here . You may find it easier to follow along by downloading the http://files.ch9.ms/coding4fun/KinectForWindowsSDKQuickstarts.zip Kinect for Windows SDK Quickstarts samples and slides . [ http://channel9.msdn.com/Series/KinectSDKQuickstarts/Audio-Fundamentals#time=0m35s 00:35 ] Kinect microphone information [ http://channel9.msdn.com/Series/KinectSDKQuickstarts/Audio-Fundamentals#time=1m10s 01:10 ] Audio data [ http://channel9.msdn.com/Series/KinectSDKQuickstarts/Audio-Fundamentals#time=2m15s 02:15 ] Speech recognition information [ http://channel9.msdn.com/Series/KinectSDKQuickstarts/Audio-Fundamentals#time=5m8s 05:08 ] Recording audio [ http://channel9.msdn.com/Series/KinectSDKQuickstarts/Audio-Fundamentals#time=8m17s 08:17 ] Speech recognition demo <h3>Setup</h3> The steps below assume you have setup your development environment as explained in the " http://channel9.msdn.com/Series/KinectSDKQuickstarts/Getting-Started Setting Up Your Development Environment " video. <h1>Task: Designing Your UI</h1> We’ll add in a Slider and two Button controls, and well also use some stack panels to be sure everything lines up nicely: XAML
<pre class="brush: xml <Window x:Class="AudioRecorder.MainWindow"
xmlns=" http://schemas.microsoft.com/winfx/2006/xaml/presentation" http://schemas.microsoft.com/winfx/2006/xaml/presentation"
xmlns:x=" http://schemas.microsoft.com/winfx/2006/xaml" http://schemas.microsoft.com/winfx/2006/xaml"
Title="Audio Recorder Sample" Height="159" Width="525
<Grid>
<StackPanel>
<StackPanel Orientation="Horizontal
<Label Content="Seconds to Record: " />
<Label Content="{Binding ElementName=RecordForTimeSpan, Path=Value}" />
</StackPanel>
<Slider Name="RecordForTimeSpan" Minimum="1" Maximum="25" IsSnapToTickEnabled="True" />
<StackPanel Orientation="Horizontal" HorizontalAlignment="Center
<Button Content="Record" Height="50" Width="100" Name="RecordButton" />
<Button Content="Play" Height="50" Width="100" Name="PlayButton" />
</StackPanel>
<MediaElement Name="audioPlayer" />
</StackPanel>
</Grid>
</Window>[/code] http://files.channel9.msdn.com/wlwimages/9c00b398b405423b99d19efa016fae96/image%5B4%5D.png <img title="image" src="http://files.channel9.msdn.com/wlwimages/9c00b398b405423b99d19efa016fae96/image_thumb%5B2%5D-1.png" alt="image" width="558" height="190" border="0 <h3>Creating Click events</h3> For each button, well want to create a click event. Go to the properties window (F4), select the RecordButton , select the Events tab, and double click on the Click event to create the RecordButton_Click event. Do the same for the Play Button so we have the PlayButton_Click event wired up as well http://files.channel9.msdn.com/wlwimages/9c00b398b405423b99d19efa016fae96/image%5B9%5D.png <img title="image" src="http://files.channel9.msdn.com/wlwimages/9c00b398b405423b99d19efa016fae96/image_thumb%5B5%5D.png" alt="image" width="293" height="162" border="0 <h1>Task: Working with the KinectAudioSource</h1> The first task is to add in the Kinect Audio library: C#
<pre class="brush: csharp using Microsoft.Research.Kinect.Audio;[/code] Visual Basic
<pre class="brush: vb Imports Microsoft.Research.Kinect.Audio[/code] Threading and apartment states From this point forward, well be dealing with threading since the array requires a multi-threaded apartment state but WPF has a single threaded apartment state. To find out more about apartment states, check out the MSDN page on it: <a title="http://msdn.microsoft.com/en-us/library/system.threading.apartmentstate.aspx http://msdn.microsoft.com/en-us/library/system.threading.apartmentstate.aspx http://msdn.microsoft.com/en-us/library/system.threading.apartmentstate.aspx . This is easy to work around—we just have to keep note of how we access different items. Well accomplish this by creating a new thread that will do the actual recording and file saving. Well create two variables and an event outside the RecordButton_Click event to help deal with the cross-threading issue. The FinishedRecording event will allow us to notify the user-interface thread that were done recording: C#
<pre class="brush: csharp double _amountOfTimeToRecord;
string _lastRecordedFileName;
private event RoutedEventHandler FinishedRecording;[/code] Visual Basic
<pre class="brush: vb Private _amountOfTimeToRecord As Double
Private _lastRecordedFileName As String
Private Event FinishedRecording As RoutedEventHandler[/code] Now that we can keep track of necessary information, well create a new method to do the recording. This is the method well tell the new thread to execute: C#
<pre class="brush: csharp private void RecordAudio()
{
}[/code] Visual Basic
<pre class="brush: vb Private Sub RecordAudio()
End Sub[/code] To gain threading, well add in the threading name space: C#
<pre class="brush: csharp using System.Threading;[/code] Visual Basic
<pre class="brush: vb Imports System.Threading[/code] Now well create the thread and do some simple end-user management in the RecordButton_Click event. First well disable the two buttons, record the audio, and create a unique file name. Then well create a new Thread and use the SetApartmentState method to give it a MTA state: C#
<pre class="brush: csharp private void RecordButton_Click(object sender, RoutedEventArgs e)
{
RecordButton.IsEnabled = false;
PlayButton.IsEnabled = false;
_amountOfTimeToRecord = RecordForTimeSpan.Value;
_lastRecordedFileName = DateTime.Now.ToString("yyyyMMddHHmmss") + "_wav.wav";

var t = new Thread(new ThreadStart(RecordAudio));
t.SetApartmentState(ApartmentState.MTA);
t.Start();
}[/code] Visual Basic
<pre class="brush: vb Private Sub RecordButton_Click(ByVal sender As Object, ByVal e As RoutedEventArgs)

RecordButton.IsEnabled = False
PlayButton.IsEnabled = False
_amountOfTimeToRecord = RecordForTimeSpan.Value
_lastRecordedFileName = Date.Now.ToString("yyyyMMddHHmmss") & "_wav.wav"

Dim t = New Thread(New ThreadStart(AddressOf RecordAudio))
t.SetApartmentState(ApartmentState.MTA)
t.Start()

End Sub[/code] <h1>Task: Capturing Audio Data</h1> From here, this sample and the built-in sample are pretty much the same. Well only add three differences: the FinishedRecording event, a dynamic playback time, and the dynamic file name. Note that the WriteWavHeader function is the exact same as the one in the built-in demo as well. Since we leverage different types of streams, well add the System.IO namespace: C#
<pre class="brush: csharp using System.IO;[/code] Visual Basic
<pre class="brush: vb Imports System.IO[/code] The entire RecordAudio method: C#
<pre class="brush: csharp private void RecordAudio()
{
using (var source = new KinectAudioSource())
{
var recordingLength = (int) _amountOfTimeToRecord * 2 * 16000;
var buffer = new byte[1024]; source.SystemMode = SystemMode.OptibeamArrayOnly; using (var fileStream = new FileStream(_lastRecordedFileName, FileMode.Create))
{
WriteWavHeader(fileStream, recordingLength);

//Start capturing audio
using (var audioStream = source.Start())
{
//Simply copy the data from the stream down to the file
int count, totalCount = 0;
while ((count = audioStream.Read(buffer, 0, buffer.Length)) > 0 && totalCount < recordingLength)
{
fileStream.Write(buffer, 0, count);
totalCount += count;
}
}
}

if (FinishedRecording != null)
FinishedRecording(null, null);
}
}[/code] Visual Basic
<pre class="brush: vb Private Sub RecordAudio() Using source = New KinectAudioSource

Dim recordingLength = CInt(Fix(_amountOfTimeToRecord)) * 2 * 16000
Dim buffer = New Byte(1023) {}

source.SystemMode = SystemMode.OptibeamArrayOnly

Using fileStream = New FileStream(_lastRecordedFileName, FileMode.Create)

WriteWavHeader(fileStream, recordingLength)

Start capturing audio
Using audioStream = source.Start()

Simply copy the data from the stream down to the file
Dim count As Integer, totalCount As Integer = 0
count = audioStream.Read(buffer, 0, buffer.Length)
Do While count > 0 AndAlso totalCount < recordingLength

fileStream.Write(buffer, 0, count)
totalCount += count

count = audioStream.Read(buffer, 0, buffer.Length)
Loop

End Using

End Using

RaiseEvent FinishedRecording(Nothing, Nothing)

End Using

End Sub[/code] <h1>Task: Playing Back the Audio We Just Captured</h1> So weve recorded the audio, saved it, and fired off an event that said were done—lets hook into it. Well wire up that event in the MainWindow constructor: c#
<pre class="brush: csharp public MainWindow()
{
InitializeComponent();

FinishedRecording += new RoutedEventHandler(MainWindow_FinishedRecording);
}[/code] Visual Basic
<pre class="brush: vb Public Sub New()
InitializeComponent()

AddHandler FinishedRecording, AddressOf MainWindow_FinishedRecording
End Sub[/code] Since that event will return on a non-UI thread, well need to use the Dispatcher to get us back on a UI thread so we can reenable those buttons: C#
<pre class="brush: csharp void MainWindow_FinishedRecording(object sender, RoutedEventArgs e)
{
Dispatcher.BeginInvoke(new ThreadStart(ReenableButtons));
}

private void ReenableButtons()
{
RecordButton.IsEnabled = true;
PlayButton.IsEnabled = true;
}[/code] Visual Basic
<pre class="brush: vb Private Sub MainWindow_FinishedRecording(sender As Object, e As RoutedEventArgs)
Dispatcher.BeginInvoke(New ThreadStart(ReenableButtons))
End Sub

Private Sub ReenableButtons()
RecordButton.IsEnabled = True
PlayButton.IsEnabled = True
End Sub[/code] And finally, well make the Media element play back the audio we just saved! Well also verify both that the file exists and that the user recorded some audio: c#
<pre class="brush: csharp private void PlayButton_Click(object sender, RoutedEventArgs e)
{
if (!string.IsNullOrEmpty(_lastRecordedFileName) && File.Exists(_lastRecordedFileName))
{
audioPlayer.Source = new Uri(_lastRecordedFileName, UriKind.RelativeOrAbsolute);
audioPlayer.LoadedBehavior = MediaState.Play;
audioPlayer.UnloadedBehavior = MediaState.Close;
}
}[/code] Visual Basic
<pre class="brush: vb Private Sub PlayButton_Click(sender As Object, e As RoutedEventArgs)

If (Not String.IsNullOrEmpty(_lastRecordedFileName)) AndAlso File.Exists(_lastRecordedFileName) Then

audioPlayer.Source = New Uri(_lastRecordedFileName, UriKind.RelativeOrAbsolute)
audioPlayer.LoadedBehavior = MediaState.Play
audioPlayer.UnloadedBehavior = MediaState.Close

End If

End Sub
[/code] <h1>Task: Speech Recognition</h1> To do speech recognition, we need to bring in the speech recognition namespaces from the speech SDK: C#
<pre class="brush: csharp using Microsoft.Speech.AudioFormat;
using Microsoft.Speech.Recognition;[/code] Visual Basic
<pre class="brush: vb Imports Microsoft.Speech.AudioFormat
Imports Microsoft.Speech.Recognition[/code] In VB well also need to add in a MTA flag as well under the Sub Main . C# does not need this. Visual Basic <pre class="brush: vb <MTAThread()> _
Shared Sub Main(ByVal args() As String)[/code] Next, we need to setup the KinectAudioSource in a way thats compatbile for speech recognition: C#
<pre class="brush: csharp using (var source = new KinectAudioSource())
{
source.FeatureMode = true;
source.AutomaticGainControl = false; //Important to turn this off for speech recognition
source.SystemMode = SystemMode.OptibeamArrayOnly; //No AEC for this sample
}
[/code] Visual Basic
<pre class="brush: vb Using source = New KinectAudioSource

source.FeatureMode = True
source.AutomaticGainControl = False Important to turn this off for speech recognition
source.SystemMode = SystemMode.OptibeamArrayOnly No AEC for this sample

End Using[/code] With that in place, we can initialize the SpeechRecognitionEngine to use the Kinect recognizer, which was downloaded earlier: C#
<pre class="brush: csharp private const string RecognizerId = "SR_MS_en-US_Kinect_10.0";
RecognizerInfo ri = SpeechRecognitionEngine.InstalledRecognizers().Where(r => r.Id == RecognizerId).FirstOrDefault();[/code] Visual Basic
<pre class="brush: vb Private Const RecognizerId As String = "SR_MS_en-US_Kinect_10.0"
Dim ri As RecognizerInfo = SpeechRecognitionEngine.InstalledRecognizers().Where(Function(r) r.Id = RecognizerId).FirstOrDefault()[/code] Next, a "grammar" needs to be setup, which specifies which words the speech recognition engine should listen for. The following code creates a grammar for the words "red", "blue" and "green". C#
<pre class="brush: csharp using (var sre = new SpeechRecognitionEngine(ri.Id))
{
var colors = new Choices();
colors.Add("red");
colors.Add("green");
colors.Add("blue"); var gb = new GrammarBuilder();
//Specify the culture to match the recognizer in case we are running in a different culture.
gb.Culture = ri.Culture;
gb.Append(colors);
// Create the actual Grammar instance, and then load it into the speech recognizer.
var g = new Grammar(gb); sre.LoadGrammar(g);
}[/code] Visual Basic
<pre class="brush: vb Using sre = New SpeechRecognitionEngine(ri.Id)

Dim colors = New Choices
colors.Add("red")
colors.Add("green")
colors.Add("blue")

Dim gb = New GrammarBuilder
Specify the culture to match the recognizer in case we are running in a different culture
gb.Culture = ri.Culture
gb.Append(colors)

Create the actual Grammar instance, and then load it into the speech recognizer.
Dim g = New Grammar(gb)

sre.LoadGrammar(g)

End Using[/code] Next, several events are hooked up so you can be notified when a word is recognized, hypothesized, or rejected: C#
<pre class="brush: csharp sre.SpeechRecognized += SreSpeechRecognized;
sre.SpeechHypothesized += SreSpeechHypothesized;
sre.SpeechRecognitionRejected += SreSpeechRecognitionRejected;[/code] Visual Basic
<pre class="brush: vb AddHandler sre.SpeechRecognized, AddressOf SreSpeechRecognized
AddHandler sre.SpeechHypothesized, AddressOf SreSpeechHypothesized
AddHandler sre.SpeechRecognitionRejected, AddressOf SreSpeechRecognitionRejected[/code] Finally, the audio stream source from the Kinect is applied to the speech recognition engine: C#
<pre class="brush: csharp using (Stream s = source.Start())
{
sre.SetInputToAudioStream(s,
new SpeechAudioFormatInfo(
EncodingFormat.Pcm, 16000, 16, 1,
32000, 2, null)); Console.WriteLine("Recognizing. Say: red, green or blue. Press ENTER to stop"); sre.RecognizeAsync(RecognizeMode.Multiple);
Console.ReadLine();
Console.WriteLine("Stopping recognizer ...");
sre.RecognizeAsyncStop();
}[/code] Visual Basic
<pre class="brush: vb Using s As Stream = source.Start()

sre.SetInputToAudioStream(s, New SpeechAudioFormatInfo(EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, Nothing))

Console.WriteLine("Recognizing. Say: red, green or blue. Press ENTER to stop")

sre.RecognizeAsync(RecognizeMode.Multiple)
Console.ReadLine()
Console.WriteLine("Stopping recognizer ...")
sre.RecognizeAsyncStop()

End Using[/code] The event handlers specified earlier display information based on the result of the users speech being recognized: C#
<pre class="brush: csharp static void SreSpeechRecognitionRejected(object sender, SpeechRecognitionRejectedEventArgs e)
{
Console.WriteLine("nSpeech Rejected");
if (e.Result != null)
DumpRecordedAudio(e.Result.Audio);
}

static void SreSpeechHypothesized(object sender, SpeechHypothesizedEventArgs e)
{
Console.Write("rSpeech Hypothesized: t{0}tConf:t{1}", e.Result.Text, e.Result.Confidence);
}

static void SreSpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
Console.WriteLine("nSpeech Recognized: t{0}", e.Result.Text);
}

private static void DumpRecordedAudio(RecognizedAudio audio)
{
if (audio == null)
return;

int fileId = 0;
string filename;
while (File.Exists((filename = "RetainedAudio_" + fileId + ".wav")))
fileId++;

Console.WriteLine("nWriting file: {0}", filename);
using (var file = new FileStream(filename, System.IO.FileMode.CreateNew))
audio.WriteToWaveStream(file);
}[/code] Visual Basic
<pre class="brush: vb Private Shared Sub SreSpeechRecognitionRejected(ByVal sender As Object, ByVal e As SpeechRecognitionRejectedEventArgs)

Console.WriteLine(vbLf & "Speech Rejected")
If e.Result IsNot Nothing Then
DumpRecordedAudio(e.Result.Audio)
End If

End SubPrivate Shared Sub SreSpeechHypothesized(ByVal sender As Object, ByVal e As SpeechHypothesizedEventArgs)

Console.Write(vbCr & "Speech Hypothesized: " & vbTab & "{0}" & vbTab & "Conf:" & vbTab & "{1}", e.Result.Text, e.Result.Confidence)

End SubPrivate Shared Sub SreSpeechRecognized(ByVal sender As Object, ByVal e As SpeechRecognizedEventArgs)

Console.WriteLine(vbLf & "Speech Recognized: " & vbTab & "{0}", e.Result.Text)

End Sub

Private Shared Sub DumpRecordedAudio(ByVal audio As RecognizedAudio)
If audio Is Nothing Then
Return
End If

Dim fileId As Integer = 0
Dim filename As String
filename = "RetainedAudio_" & fileId & ".wav"
Do While File.Exists(filename)
fileId += 1
filename = "RetainedAudio_" & fileId & ".wav"
Loop

Console.WriteLine(vbLf & "Writing file: {0}", filename)
Using file = New FileStream(filename, System.IO.FileMode.CreateNew)
audio.WriteToWaveStream(file)
End Using

End Sub [/code] In the case of a word being rejected, the audio is written out to a WAV file so it can be listened to later. <h1>Recap</h1> Weve created an application that can record audio for a variable amount of time with Kinect! <img src="http://m.webtrends.com/dcs1wotjh10000w0irc493s0e_6x1g/njs.gif?dcssip=channel9.msdn.com&dcsuri=http://channel9.msdn.com/Feeds/RSS&WT.dl=0&WT.entryid=Entry:RSSView:8f11529e21704d75a6da9f0000573858

View the full article

Audio Fundamentals

EDN Admin

Well-known member

Similar threads