<< Chapter < Page Chapter >> Page >

Inlab report

  1. Give your estimate of the pitch period for the voiced segment, and your prediction of the gender of the speaker.
  2. For each of the two vectors, VoicedSig and UnvoicedSig , list the average energy and number of zero-crossings.Which segment has a greater average energy? Which segment has a greater zero-crossing rate?
  3. Hand in your zero_cross function.

Phonemes

Phonemes in American English. See [link] for more details.

American English can be described in terms of a set of about 42 distinct sounds called phonemes , illustrated in [link] . They can be classified in many ways according to their distinguishingproperties. Vowels are formed by exciting a fixed vocal tract with quasi-periodic pulses of air. Fricatives are produced by forcing air through a constriction (usually towards the mouth end of thevocal tract), causing turbulent air flow. Fricatives may be voiced or unvoiced. Plosive sounds are created by making a complete closure, typically at the frontal vocal tract, building up pressure behind the closureand abruptly releasing it. A diphthong is a gliding monosyllabic sound that starts at or near the articulatory position forone vowel, and moves toward the position of another. It can be a very insightful exercise to recite the phonemes shown in [link] , and make a note of the movements you are making to create them.

It is worth noting at this point that classifying speech sounds as voiced/unvoiced is not equivalent to thevowel/consonant distinction. Vowels and consonants are letters , whereas voiced and unvoiced refer to types of speech sounds . There are several consonants, /m/ and /n/ for example, which when spoken areactually voiced sounds.

Short-time frequency analysis

As we have seen from previous sections, the properties of speech signals are continuously changing, but may be considered to be stationarywithin an appropriate time frame. If analysis is performed on a “segment-by-segment” basis, useful information about the constructionof an utterance may be obtained. The average energy and zero-crossing rate, as previously discussed, are examples of short-time featureextraction in the time-domain. In this section, we will learn how to obtain short-time frequency information from generally non-stationarysignals.

Stdtft

Download the file go.au for the following section.

A useful tool for analyzing the spectral characteristics of a non-stationary signal is the short-time discrete-time Fourier Transform , or stDTFT , which we will define by the following:

X m ( e j ω ) = n = - x ( n ) w ( n - m ) e - j ω n

Here, x ( n ) is our speech signal, and w ( n ) is a window of length L . Notice that if we fix m , the stDTFT is simply the DTFT of x ( n ) multiplied by a shifted window. Therefore, X m ( e j ω ) is a collection of DTFTs of windowed segments of x ( n ) .

As we examined in the Digital Filter Design lab, windowing in the time domain causes an undesirable ringing in the frequency domain. This effectcan be reduced by using some form of a raised cosine for the window w ( n ) .

Write a function X = DFTwin(x,L,m,N) that will compute the DFT of a windowed length L segment of the vector x .

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Purdue digital signal processing labs (ece 438). OpenStax CNX. Sep 14, 2009 Download for free at http://cnx.org/content/col10593/1.4
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Purdue digital signal processing labs (ece 438)' conversation and receive update notifications?

Ask