<< Chapter < Page Chapter >> Page >
This appendix explains how the Speak and Spell speech data was packed in the ROM code. It takes the coded data for a spoken word and shows the process of packing it into a set of data that would be found in the ROM code.

Introduction

A detail worth spending time on is how the speech data was packed into the ROM. An example of how it was done is shown in the following four figures (Figures 1 - 4). Figure 1 is an overview of the packing algorithm that I put together for a presentation on this topic. Figures 2 and 3 show the coded data for the word "cage". The coded data was taken from the information at the top of Figure 4. When the packing process was completed as shown in Figures 2 and 3, the resulting data matches the information in the bottom part of Figure 4. I am relatively certain that just using the four figures won't help much in understanding the process. So, I will pull excerpts from Figures 2 and 3 and use them to explain the process.

An overview of how the encoded speech data was stored in memory.

First page of encoded data for the word "cage" (frame number, energy, repeat, pitch, K1 - k6).

Page 2 of encoded data for the word "cage" (K7 - K10)

Computer printout showing the intitial encoded data and the final packed data for the word "cage"

The top set of data in Figure 4 is the parametric data for the word "cage". The first column is the frame number, the second is the energy level the third column is the pitch period, and the remaining columns are the reflection coefficients going from K1 to K10 starting from the left going to the right. The bottom set of data is the final packed data for the encoded word.

I have taken the first five frames of data from Figures 2 and 3 and put them in Table 1. It will be easier to see the data and explain the process using this table rather than attempting to work through the hand written figures.

First 6 frames from figures 1 and 2
Frame Energy Rpt Pitch K1 K2 K3 K4 K5 K6 K7 K8 K9 K10
8 1001 0 00000 10101 10110 0110 0110 - - - - - -
9 0110 1 00000 - - - - - - - - - -
10 0110 1 00000 - - - - - - - - - -
11 1101 0 01010 10010 10000 0101 0101 0110 1011 1010 101 011 010
12 1101 1 01011 - - - - - - - - - -
13 1101 0 01100 10110 10001 0111 0100 0000 1010 1011 110 100 011

Notice that frames 8 - 10 are unvoiced with frames 9 and 10 being repeated copies of frame 8. The "1" in frames 9 and 10 indicate that they are repeated frames. Frames 11 - 13 are voiced frames. Frame 12 is a repeat frame. Referring back to figure 1 you can see that an unvoiced frame (frame 8) only has the first four reflection coefficients (K1 - K4), where a voiced frame has all ten coefficients (frames 11 and 12). In all cases the repeat frame has no coefficients and the repeat flag is set to a "1".

The process consists of several steps

  1. Encode the parameters into binary
  2. Repack the binary numbers into hexadecimal
  3. Bit reverse each hexadecimal number
  4. Reverse the order for each pair of hexadecimal numbers

If I take the binary sequence for Frames 8 through 13 I get this sequence of bits:

1001 0 00000 10101 10110 0110 0110 . 0110 1 00000 . 0110 1 00000 . 1101 0 01010 10010 10000 0101 0101 0110 1011 1010 101 011 010 . 1101 1 01011 . 1101 0 01100 10110 10001 0111 0100 0000 1010 1011 110 100 011

Notice that I have inserted a "." to separate each of the frame sequences and have used a blank to separate the 13 parameters within each frame. The next task is to reformat the bits into hexadecimal. the bits for each hexadecimal number are shown in parenthesis below:

(1001) (0 000)(00 10)(101 1)(0110) (0110) (0110) . (0110) (1 000)(00 . 01)(10 1 0)(0000) . (1101) (0 010)(10 10)(010 1)(0000) (0101) (0101) (0110) (1011) (1010) (101 0)(11 01)(0 . 110)(1 1 01)(011 . 1)(101 0) (0110)(0 101)(10 10)(001 0)(111 0)(100 0)(000 1)(010 1)(011 1)(10 10)(0 011) [1011]

I have put brackets around the last nibble to indicate that it came from frame 14. It was necessary to create an even number of nibbles so that the process could be completed on this example. Now that the binary sequence has been organized into nibbles, I can use Table 2 to convert the nibbles into hexadecimal.

Hexadecimal table
Decimal Binary Hexadecimal Bit Reversed
0 0000 0 0
1 0001 1 8
2 0010 2 4
3 0011 3 C
4 0100 4 2
5 0101 5 A
6 0110 6 6
7 0111 7 E
8 1000 8 1
9 1001 9 9
10 1010 A 5
11 1011 B D
12 1100 C 3
13 1101 D B
14 1110 E 7
15 1111 F F

In hexadecimal it would look like: 90 2B 66 66 81 A0 D2 A5 05 56 BA AD 6D 7A 65 A2 E8 15 7A 3D

Bit reversed would look like: 90 4D 66 66 18 50 B4 5A 0A A6 D5 5B 6B E5 6A 54 71 8A E5 CB

Finally doing a pair wise nibble switch it would look like: 09 D4 66 66 81 05 4B A5 A0 6A 5D B5 B6 5E A6 45 17 A8 5E BC

If this sequence is compared to the bottom data set of Figure 4 it will be comforting to see them identical. Obviously we could have completed the whole word to verify that all of works. But, then, that is what Figures 2 and 3 attempted to do.

You may notice that I have ignored the creation of and use of the encode and decode tables. These tables were created based on a specific professional speaker. For each of the coefficients a test data set was used to reduce all of the variations to a set of buckets. For example with K1 where there are five bits to define the value of the coefficient, the data set was split into 32 buckets ranging from the largest to the smallest. A median point was selected to be the value used for the decoder. As this was specific to each professional speaker and therefore to each version of the TMS028x it will not be presented. That part of the process is left to the student to figure out. And, yes, you may have noted that I didn't disclose how the spelling of the words was packed into the ROM along with the speech data. Another aspect left to the student to figure out.

Questions & Answers

A golfer on a fairway is 70 m away from the green, which sits below the level of the fairway by 20 m. If the golfer hits the ball at an angle of 40° with an initial speed of 20 m/s, how close to the green does she come?
Aislinn Reply
cm
tijani
what is titration
John Reply
what is physics
Siyaka Reply
A mouse of mass 200 g falls 100 m down a vertical mine shaft and lands at the bottom with a speed of 8.0 m/s. During its fall, how much work is done on the mouse by air resistance
Jude Reply
Can you compute that for me. Ty
Jude
what is the dimension formula of energy?
David Reply
what is viscosity?
David
what is inorganic
emma Reply
what is chemistry
Youesf Reply
what is inorganic
emma
Chemistry is a branch of science that deals with the study of matter,it composition,it structure and the changes it undergoes
Adjei
please, I'm a physics student and I need help in physics
Adjanou
chemistry could also be understood like the sexual attraction/repulsion of the male and female elements. the reaction varies depending on the energy differences of each given gender. + masculine -female.
Pedro
A ball is thrown straight up.it passes a 2.0m high window 7.50 m off the ground on it path up and takes 1.30 s to go past the window.what was the ball initial velocity
Krampah Reply
2. A sled plus passenger with total mass 50 kg is pulled 20 m across the snow (0.20) at constant velocity by a force directed 25° above the horizontal. Calculate (a) the work of the applied force, (b) the work of friction, and (c) the total work.
Sahid Reply
you have been hired as an espert witness in a court case involving an automobile accident. the accident involved car A of mass 1500kg which crashed into stationary car B of mass 1100kg. the driver of car A applied his brakes 15 m before he skidded and crashed into car B. after the collision, car A s
Samuel Reply
can someone explain to me, an ignorant high school student, why the trend of the graph doesn't follow the fact that the higher frequency a sound wave is, the more power it is, hence, making me think the phons output would follow this general trend?
Joseph Reply
Nevermind i just realied that the graph is the phons output for a person with normal hearing and not just the phons output of the sound waves power, I should read the entire thing next time
Joseph
Follow up question, does anyone know where I can find a graph that accuretly depicts the actual relative "power" output of sound over its frequency instead of just humans hearing
Joseph
"Generation of electrical energy from sound energy | IEEE Conference Publication | IEEE Xplore" ***ieeexplore.ieee.org/document/7150687?reload=true
Ryan
what's motion
Maurice Reply
what are the types of wave
Maurice
answer
Magreth
progressive wave
Magreth
hello friend how are you
Muhammad Reply
fine, how about you?
Mohammed
hi
Mujahid
A string is 3.00 m long with a mass of 5.00 g. The string is held taut with a tension of 500.00 N applied to the string. A pulse is sent down the string. How long does it take the pulse to travel the 3.00 m of the string?
yasuo Reply
Who can show me the full solution in this problem?
Reofrir Reply
Got questions? Join the online conversation and get instant answers!
Jobilize.com Reply

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, The speak n spell. OpenStax CNX. Jan 31, 2014 Download for free at http://cnx.org/content/col11501/1.5
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'The speak n spell' conversation and receive update notifications?

Ask