<< Chapter < Page Chapter >> Page >
How speech can be modeled as a source signal passing through a filter.

The make-up of speech

The components of speech are the words and the voice .

Every phrase is a union of these two components - they are the foundations of the spoken language. One or the other does not mean much without its conterpart. Words without voice lackintonation, so they have no meaning. Voice without words is devoid of structure and cannot possibly transfer information. Only the fusion of the two can claim to be such a thing as speech.

In biology, the components of speech are produced in different organs. To speak, air is first released over the vocal cords, which expand and contract to give the air column structure. This isthe biological concept of words. The words are then passed through the vocal tract where they are shaped, givingthem intonation. This shaping of the words is the biological concept of voice. Such a biological process can be easily modeled.

So far, we have determined that speech is a collection of words shaped by voice. Here, we present a model of this. In this model, the words are called the source . Since the words are modified by voice, we say the source passes through a filter . This brings us to the source filter model of speech.

Source Filter Model
The source filter model is a model of speech where the spoken word is comprised of a source component originating from the vocal cords whichis then shaped by a filter immitating the effect of the vocal tract.

This model has possibility for application in many different fields. We will focus on the topic of signal processing here.

Signal processing considerations

The source filter model can easily be extended to signal processing. The source is simply a signal x t . This signal isthe input to the filter and is called the excitation signal since it excites the vocal tract. The vocal tract is a filter similar to all filters we have studied so far: it is a lineartime-invariant system with impulse response h t . This is sometimes called the transfer function of speech since it is what transfers the excitation signal to speech - it adds voice to words. Speech is theoutput y t of thesource signal x t passed through the filter with impulse response h t . Thus, theoutput is given by y t = x t * h t . This is depicted below:

Signal processing representation of the source filter model

An input x(t) to a filter with impulse response h(t) yields the convolution of the two.

Since speech is simply a convolution of a source signal x t with a filter'sinput response h t , we can analyze these signals to determine the characeristics of a speech signal y t . However, wemust first deconvolve these signals so that they can be processed individually. This topic is exploredin the next section covering deconvolution .

References

Huckvale, Mark. "Lecture 8: Source-Filter Model of Speech Production." B214: Phonetic Science: Acoustics of Speech and Hearing. University College London. (External Link) .

Johnson, Don. Connexions module m0049: Modeling the Speech Signal .

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Methods for voice conversion. OpenStax CNX. Dec 21, 2004 Download for free at http://cnx.org/content/col10252/1.2
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Methods for voice conversion' conversation and receive update notifications?

Ask