<< Chapter < Page Chapter >> Page >

Lpc implementation

Linear Predictive Coding or LPC is usually applied for speech compression but we will be utilizing its very output dependent form for analysis of our signal. The fundamental principle behind LPC is that one can predict or approximate the future elements of a signal based on linear combinations of the previous signal elements and, of course, and an input excitation signal.

All equations are from: http://cs.haifa.ac.il/~nimrod/Compression/Speech/S4LinearPredictionCoding2009.pdf .

The effectiveness of this model stems from fact that the processes that generate speech in the human body are relatively slow and that the human voice is quite limited in frequency range. The physical model of the vocal tract in LPC is a buzzer, corresponding the glottis which produces the baseline buzz in the human voice, at the end of a tube which has linear characteristics.

Courtesy of https://engineering.purdue.edu/CFDLAB/projects/voice.html

In “systems” speak we clearly see the form of a feedback filter emerging so to further analyze the system we take its Z-transform and find a transfer function from U[z] to S[z].

The result is clearly an all pole filter and in standard application one would feed in the generating signal and get out a compressed version of the output.

The key barrier to implementing this filter is of course determining the “a” values or the coefficients of our linear superposition approximation of the output signal. Ultimately when we form the linear superposition, we want to choose coefficients that yield a compressed signal with the minimum deviation from the original signal; equivalently we want to minimize the difference(error) between the two signals.

From the form of s(n) we can derive and equivalent condition on the auto-correlation R[j].

Where:

Thus, we have p such equations, one for each R(j) and so we can more easily describe our conditions in terms of a matrix equation.

The matrix we now need to invert and multiply has a unique constant diagonal structure which classifies it as a Toeplitz matrix. There have been multiple methods developed for solving equations with Toeplitz matrices and one of the most efficient method, the method we used, is the the Levinson Durbin algorithm. This method is a bit involved but fundamentally it solves the system of equations by solving smaller sub-matrix equations and iterating up to get a recursive solution for all the coefficients.

Application

To reapply this method, this filter, towards our goal of speech analysis we first note that the form of the filter primarily dependent on the output rather than the input. The coefficients that we derived using the Levinson Durbin algorithm only use properties (the auto-correlation) of the output signal rather than the input signal. This means that this filter, in a way, is more natural as a method for going form output to input rather than the reverse, all we need do is take the reciprocal of the transfer function.

We go from an all pole filter to an all zero filter which now takes in a speech signal and returns the generating signal. This transfer function is actually more useful for our purposes because of our method of analyzing speech signals. We are primarily looking to identify the formants in the speech signal, the fundamental components of phonemes that make up human alphabets. These formants directly correspond to the resonant modes of the vocal tract, so we are effectively trying to achieve a natural mode decomposition of a complex resonant cavity.

Courtesy of http://hyperphysics.phy-astr.gsu.edu/hbase/music/vocres.html

Therefore these formants are more easily identifiable in the generating signal (since there are inherently a property of the generating cavity). With the filter generated by LPC we can now reconstruct a linear approximation of the generation signal using the speech signals from our soundbank. Our full signal of course can be represented by a spectrogram and the formant correspond to the local maxima of each time slice of the spectrogram.

Spectrogram (left); One slice of the spectrogram, with peaks and troughs highlighted (right)

What we chose to extract from these spectrograms were the amplitude and frequency data of the first 4 formants present in the signal, as these are usually the most dominant, as well as the same information about the minima in between the peaks. This is the information we will need to feed into our classifier for emotion classification.

Questions & Answers

what is biology
Hajah Reply
the study of living organisms and their interactions with one another and their environments
AI-Robot
what is biology
Victoria Reply
HOW CAN MAN ORGAN FUNCTION
Alfred Reply
the diagram of the digestive system
Assiatu Reply
allimentary cannel
Ogenrwot
How does twins formed
William Reply
They formed in two ways first when one sperm and one egg are splited by mitosis or two sperm and two eggs join together
Oluwatobi
what is genetics
Josephine Reply
Genetics is the study of heredity
Misack
how does twins formed?
Misack
What is manual
Hassan Reply
discuss biological phenomenon and provide pieces of evidence to show that it was responsible for the formation of eukaryotic organelles
Joseph Reply
what is biology
Yousuf Reply
the study of living organisms and their interactions with one another and their environment.
Wine
discuss the biological phenomenon and provide pieces of evidence to show that it was responsible for the formation of eukaryotic organelles in an essay form
Joseph Reply
what is the blood cells
Shaker Reply
list any five characteristics of the blood cells
Shaker
lack electricity and its more savely than electronic microscope because its naturally by using of light
Abdullahi Reply
advantage of electronic microscope is easily and clearly while disadvantage is dangerous because its electronic. advantage of light microscope is savely and naturally by sun while disadvantage is not easily,means its not sharp and not clear
Abdullahi
cell theory state that every organisms composed of one or more cell,cell is the basic unit of life
Abdullahi
is like gone fail us
DENG
cells is the basic structure and functions of all living things
Ramadan
What is classification
ISCONT Reply
is organisms that are similar into groups called tara
Yamosa
in what situation (s) would be the use of a scanning electron microscope be ideal and why?
Kenna Reply
A scanning electron microscope (SEM) is ideal for situations requiring high-resolution imaging of surfaces. It is commonly used in materials science, biology, and geology to examine the topography and composition of samples at a nanoscale level. SEM is particularly useful for studying fine details,
Hilary
cell is the building block of life.
Condoleezza Reply
Got questions? Join the online conversation and get instant answers!
Jobilize.com Reply

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Robust classification of highly-specific emotion in human speech. OpenStax CNX. Dec 14, 2012 Download for free at http://cnx.org/content/col11465/1.1
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Robust classification of highly-specific emotion in human speech' conversation and receive update notifications?

Ask