# 0.4 Linear predictive coding in voice conversion

 Page 1 / 1
Using linear predictive coding to change the voice quality of a source speaker to a target.

## Background on linear predictive coding

Linear Predictive Coding (or “LPC”) is a method of predicting a sample of a speech signal based on several previous samples. Similar to the method employed by the cepstrum , we can use the LPC coefficients to separate a speech signal into two parts: the transfer function (which contains the vocal quality) and the excitation (which contains the pitch and the sound). The method of looking at speech as two parts which can be separated is known as the Source Filter Model of Speech .

We can predict that the nth sample in a sequence of speech samples is represented by the weighted sum of the p previous samples:

$\stackrel{^}{s}=\sum _{k=1}^{p}{a}_{k}s\left[n-k\right]$

The number of samples (p) is referred to as the “order” of the LPC. As p approaches infinity, we should be able to predict the nth sample exactly. However, p is usually on the order of ten to twenty, where it can provide an accurate enough representation with a limited cost of computation. The weights on the previous samples (ak) are chosen in order to minimize the squared error between the real sample and its predicted value. Thus, we want the error signal e(n), which is sometimes referred to as the LPC residual, to be as small as possible:

$e\left[n\right]=s\left[n\right]-\stackrel{^}{s}\left[n\right]=s\left[n\right]-\sum _{k=1}^{p}{a}_{k}s\left[n-k\right]$

We can take the z-transform of the above equation:

$E\left(z\right)=S\left(z\right)-\sum _{k=1}^{p}{a}_{k}S\left(z\right){z}^{-k}=S\left(z\right)\left[1-\sum _{k=1}^{p}{a}_{k}{z}^{-k}\right]=S\left(z\right)A\left(z\right)$

Thus, we can represent the error signal E(z) as the product of our original speech signal S(z) and the transfer function A(z). A(z) represents an all-zero digital filter, where the ak coefficients correspond to the zeros in the filter’s z-plane. Similarly, we can represent our original speech signal S(z) as the product of the error signal E(z) and the transfer function 1 / A(z):

$S\left(z\right)=\frac{E\left(z\right)}{A\left(z\right)}$

The transfer function 1/A(z) represents an all-pole digital filter, where the ak coefficients correspond to the poles in the filter’s z-plane. Note that the roots of the A(z) polynomial must all lie within the unit circle to ensure stability of this filter.

The spectrum of the error signal E(z) will have a different structure depending on whether the sound it comes from is voiced or unvoiced. Voiced sounds are produced by vibrations of the vocal cords. Their spectrum is periodic with some fundamental frequency (which corresponds to the pitch). Examples of voiced sounds include all of the vowels. Unvoiced signals, however, do not have a fundamental frequency or a harmonic structure. Instead, they are just white noise.

## Lpc in voice conversion

In speech processing, computing the LPC coefficients of a signal gives us its ak values. From here, we can get the filter A(z) as described above. A(z) is the transfer function between the original signal s[n] and the excitation component e[n]. The transfer function of a speech signal is the part dealing with the voice quality: what distinguishes one person’s voice from another. The excitation component of a speech signal is the part dealing with the particular sounds and words that are produced. In the time domain, the excitation and transfer function are convolved to create the output voice signal. As shown in the figure below, we can put the original signal through the filter to get the excitation component. Putting the excitation component through the inverse filter (1 / A(z)) gives us the original signal back.

We can perform voice conversion by replacing the excitation component from the given speaker with a new one. Since we are still using the same transfer function A(z), the resulting speech sample will have the same voice quality as the original. However, since we are using a different excitation component, the resulting speech sample will have the same sounds as the new speaker.

## Pre-emphasis

In speech processing, a process called pre-emphasis is applied to the input signal before the LPC analysis. During the reconstruction following the LPC analysis, a de-emphasis process is applied to the signal to reverse the effects of pre-emphasis.

Pre- and de- emphasis are necessary because, in the spectrum of a human speech signal, the energy in the signal decreases as the frequency increases. Pre-emphasis increases the energy in parts of the signal by an amount inversely proportional to its frequency. Thus, as the frequency increases, pre-emphasis raises the energy of the speech signal by an increasing amount. This process therefore serves to flatten the signal so that the resulting spectrum consists of formants of similar heights. (Formants are the highly visible resonances or peaks in the spectrum of the speech signal, where most of the energy is concentrated.) The flatter spectrum allows the LPC analysis to more accurately model the speech segment. Without pre-emphasis, the linear prediction would incorrectly focus on the lower-frequency components of speech, losing important information about certain sounds.

## References

Deng, Li and Douglas O”Shaughnessy. Speech Processing: A Dynamic and Optimization-Oriented Approach. Marcel Dekker, Inc: New York. 2003.

Gold, Ben and Nelson Morgan. Speech and Audio Signal Processing: Processing and Perception of Speech and Music. John Wiley and Sons, Inc: New York. 2000.

Lemmetty, Sami. Review of Speech Synthesis Technology. (Master’s Thesis: Helsinki University of Technology) March 1999. (External Link) .

Markel, J.D. and A.H. Gray, Jr. Linear Predition of Speech. Springer-Verlag: Berlin. 1976.

Introduction about quantum dots in nanotechnology
what does nano mean?
nano basically means 10^(-9). nanometer is a unit to measure length.
Bharti
do you think it's worthwhile in the long term to study the effects and possibilities of nanotechnology on viral treatment?
absolutely yes
Daniel
how to know photocatalytic properties of tio2 nanoparticles...what to do now
it is a goid question and i want to know the answer as well
Maciej
Abigail
for teaching engĺish at school how nano technology help us
Anassong
Do somebody tell me a best nano engineering book for beginners?
there is no specific books for beginners but there is book called principle of nanotechnology
NANO
what is fullerene does it is used to make bukky balls
are you nano engineer ?
s.
fullerene is a bucky ball aka Carbon 60 molecule. It was name by the architect Fuller. He design the geodesic dome. it resembles a soccer ball.
Tarell
what is the actual application of fullerenes nowadays?
Damian
That is a great question Damian. best way to answer that question is to Google it. there are hundreds of applications for buck minister fullerenes, from medical to aerospace. you can also find plenty of research papers that will give you great detail on the potential applications of fullerenes.
Tarell
what is the Synthesis, properties,and applications of carbon nano chemistry
Mostly, they use nano carbon for electronics and for materials to be strengthened.
Virgil
is Bucky paper clear?
CYNTHIA
carbon nanotubes has various application in fuel cells membrane, current research on cancer drug,and in electronics MEMS and NEMS etc
NANO
so some one know about replacing silicon atom with phosphorous in semiconductors device?
Yeah, it is a pain to say the least. You basically have to heat the substarte up to around 1000 degrees celcius then pass phosphene gas over top of it, which is explosive and toxic by the way, under very low pressure.
Harper
Do you know which machine is used to that process?
s.
how to fabricate graphene ink ?
for screen printed electrodes ?
SUYASH
What is lattice structure?
of graphene you mean?
Ebrahim
or in general
Ebrahim
in general
s.
Graphene has a hexagonal structure
tahir
On having this app for quite a bit time, Haven't realised there's a chat room in it.
Cied
what is biological synthesis of nanoparticles
what's the easiest and fastest way to the synthesize AgNP?
China
Cied
types of nano material
I start with an easy one. carbon nanotubes woven into a long filament like a string
Porter
many many of nanotubes
Porter
what is the k.e before it land
Yasmin
what is the function of carbon nanotubes?
Cesar
I'm interested in nanotube
Uday
what is nanomaterials​ and their applications of sensors.
what is nano technology
what is system testing?
Got questions? Join the online conversation and get instant answers!