Group 079-06: Hands Free Computer Interface: Mechanical Parts, FAQ, and Voice Recognition

Week 3

We decided to order small sized Nikotab 0X15 tab electrodes (No. 0315, 21 x 34 mm), since there is a lack of surface area around the eye area. We also obtained a data acquisition box (the USB-6009). This will be programmed in MATLAB. We also answered questions presented to us by DJ, explaining physiology of EMG/EOG. These answers can be seen on our FAQ page.

We also worked on the word recognition part of our project. We decided that the activation and deactivation of the recording process will be signaled by stating keywords. This would eliminate the need to continuously record, since that would take up memory. The voice recordings will then be processed and plotted on a graph. To have the word program decide on what the speaker said, the newly plotted graph will be compared with previously plotted graphs, made by previously recorded words. If the current plot graph is matched, then the program will deduce that the user said the phrase relative to the matching plot graph.

From there, we delved into research on MATLAB code that would enable a word recognition system. Here is a link to our first finding: http://www.mathworks.com/tagteam/60673_91805v00_WordRecognition_final.pdf
This gave us a template for our workflow and for our MATLAB program. There are several stages to our program; the first one is training. This is when we record the user saying a single command, such as "save document," several times. We will then capture 10 seconds of this speech from the computer's built-in microphone at 8000 samples per second. The next stage, testing, will require acquiring previously recorded speech samples while processing incoming speech. The Data Acquistion Toolbox will be used to perform this function. Graphs of the speech samples will be made. This link also mentioned Mel Frequency Cepstral Coefficients (MFCCs) that give a measure of the energy within overlapping frequency bins of a sound spectrum. Using this, MFCC vectors can be calculated from the test speech and incoming speech, and be compared.
. We also found another source: http://www.ece.iit.edu/~pfelber/speechrecognition/report.pdf. This article explains Linear Predictive Coding (LPC) that can extract and store information about the points of loudness (formants) in the sound spectrum. We can use LPC to compare the formants of the stored speech and incoming speech. This article also gives MATLAB source files for extracting, matching, recording, speech, training, and testing. We also found a simple MATLAB code for recording and plotting audio samples, which can be seen in Figure 1. We can use this code, along with the code found in the previous sources, as a starting point to creating our own word recognition program.

Figure 1 : MATLAB code for recording and plotting audio samples.

Figure 2 : Plot graph of the the phrase "save document," using MATLAB code from Figure 1.

Resources:

[1] N.p. (n.d.). Developing an Isolated Word Recognition System in MATLAB. (N/A) [Online]. Available: http://www.mathworks.com/tagteam/60673_91805v00_WordRecognition_final.pdf

[2] N.p. (n.d.). Record and Play Audio. (N/A) [Online]. Available: http://www.mathworks.com/help/matlab/import_export/record-and-play-audio.html#bsdl2em

[3] P. Felber.(2001, April 25). SPEECH RECOGNITION
Report of an Isolated Word experiment. (N/A) [Online]. Available: http://www.ece.iit.edu/~pfelber/speechrecognition/report.pdf.

Group 079-06: Hands Free Computer Interface

Pages

Sunday, April 21, 2013

Mechanical Parts, FAQ, and Voice Recognition

No comments:

Post a Comment