0.12 Appendix 3: data packing

The speak n spell Page 1 / 1

This appendix explains how the Speak and Spell speech data was packed in the ROM code. It takes the coded data for a spoken word and shows the process of packing it into a set of data that would be found in the ROM code.

Introduction

A detail worth spending time on is how the speech data was packed into the ROM. An example of how it was done is shown in the following four figures (Figures 1 - 4). Figure 1 is an overview of the packing algorithm that I put together for a presentation on this topic. Figures 2 and 3 show the coded data for the word "cage". The coded data was taken from the information at the top of Figure 4. When the packing process was completed as shown in Figures 2 and 3, the resulting data matches the information in the bottom part of Figure 4. I am relatively certain that just using the four figures won't help much in understanding the process. So, I will pull excerpts from Figures 2 and 3 and use them to explain the process.

An overview of how the encoded speech data was stored in memory.

First page of encoded data for the word "cage" (frame number, energy, repeat, pitch, K1 - k6).

Page 2 of encoded data for the word "cage" (K7 - K10)

Computer printout showing the intitial encoded data and the final packed data for the word "cage"

The top set of data in Figure 4 is the parametric data for the word "cage". The first column is the frame number, the second is the energy level the third column is the pitch period, and the remaining columns are the reflection coefficients going from K1 to K10 starting from the left going to the right. The bottom set of data is the final packed data for the encoded word.

I have taken the first five frames of data from Figures 2 and 3 and put them in Table 1. It will be easier to see the data and explain the process using this table rather than attempting to work through the hand written figures.

First 6 frames from figures 1 and 2
Frame	Energy	Rpt	Pitch	K1	K2	K3	K4	K5	K6	K7	K8	K9	K10
8	1001	0	00000	10101	10110	0110	0110	-	-	-	-	-	-
9	0110	1	00000	-	-	-	-	-	-	-	-	-	-
10	0110	1	00000	-	-	-	-	-	-	-	-	-	-
11	1101	0	01010	10010	10000	0101	0101	0110	1011	1010	101	011	010
12	1101	1	01011	-	-	-	-	-	-	-	-	-	-
13	1101	0	01100	10110	10001	0111	0100	0000	1010	1011	110	100	011

Notice that frames 8 - 10 are unvoiced with frames 9 and 10 being repeated copies of frame 8. The "1" in frames 9 and 10 indicate that they are repeated frames. Frames 11 - 13 are voiced frames. Frame 12 is a repeat frame. Referring back to figure 1 you can see that an unvoiced frame (frame 8) only has the first four reflection coefficients (K1 - K4), where a voiced frame has all ten coefficients (frames 11 and 12). In all cases the repeat frame has no coefficients and the repeat flag is set to a "1".

The process consists of several steps

Encode the parameters into binary
Repack the binary numbers into hexadecimal
Bit reverse each hexadecimal number
Reverse the order for each pair of hexadecimal numbers

If I take the binary sequence for Frames 8 through 13 I get this sequence of bits:

1001 0 00000 10101 10110 0110 0110 . 0110 1 00000 . 0110 1 00000 . 1101 0 01010 10010 10000 0101 0101 0110 1011 1010 101 011 010 . 1101 1 01011 . 1101 0 01100 10110 10001 0111 0100 0000 1010 1011 110 100 011

Notice that I have inserted a "." to separate each of the frame sequences and have used a blank to separate the 13 parameters within each frame. The next task is to reformat the bits into hexadecimal. the bits for each hexadecimal number are shown in parenthesis below:

(1001) (0 000)(00 10)(101 1)(0110) (0110) (0110) . (0110) (1 000)(00 . 01)(10 1 0)(0000) . (1101) (0 010)(10 10)(010 1)(0000) (0101) (0101) (0110) (1011) (1010) (101 0)(11 01)(0 . 110)(1 1 01)(011 . 1)(101 0) (0110)(0 101)(10 10)(001 0)(111 0)(100 0)(000 1)(010 1)(011 1)(10 10)(0 011) [1011]

I have put brackets around the last nibble to indicate that it came from frame 14. It was necessary to create an even number of nibbles so that the process could be completed on this example. Now that the binary sequence has been organized into nibbles, I can use Table 2 to convert the nibbles into hexadecimal.

Hexadecimal table
Decimal	Binary	Hexadecimal	Bit Reversed
0	0000	0	0
1	0001	1	8
2	0010	2	4
3	0011	3	C
4	0100	4	2
5	0101	5	A
6	0110	6	6
7	0111	7	E
8	1000	8	1
9	1001	9	9
10	1010	A	5
11	1011	B	D
12	1100	C	3
13	1101	D	B
14	1110	E	7
15	1111	F	F

In hexadecimal it would look like: 90 2B 66 66 81 A0 D2 A5 05 56 BA AD 6D 7A 65 A2 E8 15 7A 3D

Bit reversed would look like: 90 4D 66 66 18 50 B4 5A 0A A6 D5 5B 6B E5 6A 54 71 8A E5 CB

Finally doing a pair wise nibble switch it would look like: 09 D4 66 66 81 05 4B A5 A0 6A 5D B5 B6 5E A6 45 17 A8 5E BC

If this sequence is compared to the bottom data set of Figure 4 it will be comforting to see them identical. Obviously we could have completed the whole word to verify that all of works. But, then, that is what Figures 2 and 3 attempted to do.

You may notice that I have ignored the creation of and use of the encode and decode tables. These tables were created based on a specific professional speaker. For each of the coefficients a test data set was used to reduce all of the variations to a set of buckets. For example with K1 where there are five bits to define the value of the coefficient, the data set was split into 32 buckets ranging from the largest to the smallest. A median point was selected to be the value used for the decoder. As this was specific to each professional speaker and therefore to each version of the TMS028x it will not be presented. That part of the process is left to the student to figure out. And, yes, you may have noted that I didn't disclose how the spelling of the words was packed into the ROM along with the speech data. Another aspect left to the student to figure out.

<< Chapter < Page Page > Chapter >>

Read also:

Get Jobilize Job Search Mobile App in your pocket Now!

100% Free Mobile Applications
Receive real-time job alerts and never miss the right job again

Source: OpenStax, The speak n spell. OpenStax CNX. Jan 31, 2014 Download for free at http://cnx.org/content/col11501/1.5

Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'The speak n spell' conversation and receive update notifications?

Ask

©flickr: Jonathan	Power By Megan Earhart Start Quiz
	13 Sociology 13 Aging and the Elderly MCQ By OpenStax Start Quiz
©flickr: Gareth	Professional Etiquette MCQ By Abby Sharp Start Quiz
	Assembly Programming Language By JavaChamp Team Start Quiz
	Financial Intelligence Quiz By Yasser Ibrahim Start Quiz
©flickr: Ruben	Grade 10 Module 2.1 IT Quiz (Part 2) By Christine Zeelie Start Quiz
	English Composition 2 Final Practice By Madison Christian Start Test
©flickr: Abraham	Biology Exam 3 By Vanessa Soledad Start Exam
©flickr:	Anatomy Physiology By Jemekia Weeden Start Quiz
	9 Lec:9 Descriptive Statistics By Janet Forrester Start Quiz