<< Chapter < Page | Chapter >> Page > |
Consider the source with symbols with probabilities , , , and .
Build the Huffman chart for the source defined by the 26 English letters (plus “space”)and their frequency in the Wizard of Oz as given in [link] .
a variable length code can be encoded and decoded. The first step generates a 4-PAM sequence withthe probabilities used in Example [link] (a). In the code, the symbols are assigned numerical values . The symbols, their probabilities, the numerical values, and the variable length Huffman codeare as follows:
This Huffman code was derived in
[link] .
For a length
m
input sequence, the second
step replaces each symbol value with the appropriatebinary sequence, and places the output in the vector
cx
.
The third step carries out the decoding.Assuming the encoding and decoding have been done
properly, then
cx
is transformed into the output
y
,
which should be the same as the original sequence
x
.
Indeed, running the program
codex.m
(which contains all
three steps) gives a perfect decoding.
Mimicking the code in
codex.m
, create a Huffman
encoder and decoder for the source defined in Example
[link] .
Use
codex.m
to investigate what happens when the
probabilities of the source alphabet change.
cx
with the average length of the naive encoder
[link] .
Which does a better job compressing the data?cx
to the previous
results. Which does a better job compressing the data?Using
codex.m
, implement the Huffman code from
[link] .
What is the length of the resulting data when applied to thetext of the
Wizard of Oz ? What rate of data compression
has been achieved?
Source coding is used to reduce the redundancy in the original data. If the letters in the Wizard of Oz were independent, then the Huffman coding in [link] would be optimal: no other coding method could achieve a better compression ratio. But the letters are not independent.More sophisticated schemes would consider not just the raw probabilities of the letters, but the probabilities of pairs ofletters, or of triplets, or more. As suggested by the redundancy studies in "Redundancy" , there is a lot that can be gained by exploiting higher order relationships betweenthe symbols.
“Zipped” files (usually with a
.zip
extension) are
a popular form of data compression for text (and other data) onthe web. Download a handful of
.zip
files.
Note the file size when the data is in its compressed formand the file size after decompressing (“unzipping”) the file.
How does this compare with the compression ratio achieved in
[link] ?
Notification Switch
Would you like to follow the 'Software receiver design' conversation and receive update notifications?