<< Chapter < Page Chapter >> Page >
An explanation of the basic problems in analyzing speech patterns.

The questions

The issues with speech recognition in general are complex and wide-ranging. One of the main problems lies in thecomplexity of the actual speech signal itself. In such signals, as in signal 1 below, it is very difficult to interpret the large amounts of information presented to a system.

The word diablo, with DC offset removed.

One of the more evident problems is the jaggedness of the signal. A natural speech signal is not smooth;instead, it fluctuates almost nonstop throughout the signal. Another naturally occurring property of speech patterns is thefluctuation in the volume, or amplitude, of the signal. Different people emphasize different syllables, letters, or words indifferent ways. If two signals have different volume levels, they are very difficult to compare. Speech signals also have a verylarge number of peaks in a short period of time. These peaks correspond to the syllables in the words being spoken. Comparingtwo signals becomes much more difficult as the number of peaks increases, as it is easy for results to be skewed by a higher peak,and, consequently, for those results to be interpreted incorrectly. The speed at which the input single is given is also an importantissue. A user saying their name at a speed different from the speed at which they normally speak can change results, as two versions ofthe same pattern are compared. The problem is, the time over which they are spoken is different, and must be accounted for. Finally,when examining the signal in terms of speech verification, another individual may attempt to mimic the speech of another person. Ifthe speaker has a good imitation, it would be possible for the speaker to be accepted by the system.

The answers

How do you deal with the jaggedness of the signal and the noise introduced to the signal through theenvironment?

  • In order to actually account for this, you have to pass all the signals through a smoothing filter. The filter will accomplishtwo tasks: first, it gets rid of any excess noise. Second, it gets rid of the high frequency jaggedness in the signal and leavesbehind simply the magnitude of the signal. As a result, you get a clean signal that is fairly easy to process.

How do you account for the different volumes of speakers?

  • The signals must all be normalized to the same volume before they are examined. Each signal is normalized about zero such thatall of the signals will have the same relative maximum and minimum values, and so that comparing two signals with different volumes isthe same as comparing the same two signals if they were to have the same volume.

How do you examine each of the individual peaks?

  • Just after the signal is smoothed by the filter, we use an envelope function to detect all of the peaks of the signal. Bydoing this, we can be sure that, if a signal passes a certain threshold amount, it will be examined and compared with thecorresponding signal in the database. The analysis will not be an analysis of the entire signal, but rather a formant analysis. Theindividual formant, or vowel sounds, in the signal will be examined and those will be used to verify the speaker.

How does the system handle varying speeds of inputs?

  • Both the formant analysis and the envelope functions will be used to help with varying input speeds. The envelope of the peakwill determine which vowels are available, and the actual formants themselves will be relatively unchanged. It is difficult to handlevery high speed voices, but most other voices can be handled effectively.

How can you account for imitating speech patterns?

  • Once again, the formants of the individual signals are analyzed to actually determine if a speaker is who he claims to be.In most cases, the imitating formants do not match up closely with those stored in the database, and the imitator will be denied bythe system.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Elec 301 projects fall 2006. OpenStax CNX. Sep 27, 2007 Download for free at http://cnx.org/content/col10462/1.2
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Elec 301 projects fall 2006' conversation and receive update notifications?

Ask