<< Chapter < Page Chapter >> Page >

This helps get rid of some of the potential inaccuracy of our program, but how might we deal with inevitable background noise? We determined experimentally that the mean (absolute value of the) amplitude of any partition containing the speech content is orders of magnitude higher than parts that are just background noise. With that, we set two conditions that would automatically exclude any such noise from being in our consideration. As we analyze the signal in progressive chunks, we first look to see if the chunk has a max amplitude of at least 0.1 of the max amplitude of the signal. Then, we check to see that the mean of the magnitude of the chunk was greater than or equal to the mean of the signal. If one or more of these conditions were not met, we would immediately discard the current chunk and move on to the next chunk of signal.

The application

We built the prototype by dividing our problem into five subproblems: initializing our data, reading an audio signal, determining the frequency response, cleaning up formant data and displaying information relevant to the vowels.

Initial data

In our function initialize_all_data, we store the theoretical vowel formant pairs for the words "see", "play", "hat", "palm" and "rug" into a matrix for later use. We also set up a convenient matrix called the output_matrix, created so that we could elude the need to go through 7 if-else statements when outputting a vowel.

Audio input

After the user decides how long he or she wants to speak, we use a built-in MATLAB script to read in an audio signal of that length from the user. The sampling rate of 44.1kHz is intentionally high so that we can preserve a lot of information from the original signal.

Assumptions

1) The user will speak at a relatively constant amplitude. Note that this is a little easier said than done, because words like "see" are naturally much quieter than "hat."

2) The user will speak slowly and clearly so that we can look for consistency in determining the vowels said.

The algorithm

1) Establish a set of variables that store theoretical vowel formants.

2) Record an audio signal of n seconds from the user.

3) Split the data into non-overlapping chunks of 4000 samples.

4) Preprocess the data by detrending with a low-order polynomial and using a low-order Butterworth lowpass filter.

5) Determine the transfer function of the vocal tract associated with the current chunk using an ar model.

6) Determine the peaks of the transfer function (the formants) and match with the closest formant, filtering by a least means-squared matchedfilter.

7) If amplitude of signal exceeds 0.1 max amplitude of overall signal, we assume it's potentially a vowel. Put its formants onto the "stack" of recent_formant_pairs.

8) If the last four formant pairs have been consistent (the same value), then we will assume the vowel estimation has worked. Add the formant pair to the "stack" of vowels identified in tracktimes_and_formants.

9) Now we have processed our guesses for what vowels were said when. We are going to have repeats, so run through a for-loop to clean up thetrack_times_and_formants vector into a new, workable vector called track_begin_end_formants.

10) Output data and guesses.

Limitations

1) Requires very clear enunciation. Difficult to establish because we naturally change the pitch of our words as we start and end them. Example: the end of the vowel in "strut" sounds remarkably similar to "hat" because of the way you shape your vocal tract as you close your mouth.

2) User has to say each vowel for a moderate duration of time. Too long, and a wavering voice would affect success of the program. Too short, and not enough data to assign formants.

3) This program is calibrated with formant values averaged across all ages and genders. If a male has an exceptionally deep voice, or a child/female has an exceptionally high voice, they may not be able to get accurate vowel readings from the program. Accents may also affect accuracy of results.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Vowel recognition using formant analysis. OpenStax CNX. Dec 17, 2014 Download for free at http://legacy.cnx.org/content/col11729/1.5
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Vowel recognition using formant analysis' conversation and receive update notifications?

Ask