<< Chapter < Page Chapter >> Page >

Consider the source with N = 4 symbols with probabilities p ( x 1 ) = 0 . 3 , p ( x 2 ) = 0 . 3 , p ( x 3 ) = 0 . 2 , and p ( x 4 ) = 0 . 2 .

  1. What is the entropy of this source?
  2. Build the Huffman code.
  3. What is the efficiency of this code?
  4. If this source were encoded naively, how many bits per symbol are needed? What is the efficiency?

Build the Huffman chart for the source defined by the 26 English letters (plus “space”)and their frequency in the Wizard of Oz as given in [link] .

a variable length code can be encoded and decoded. The first step generates a 4-PAM sequence withthe probabilities used in Example [link] (a). In the code, the symbols are assigned numerical values { ± 1 , ± 3 } . The symbols, their probabilities, the numerical values, and the variable length Huffman codeare as follows:

symbol probability value Huffman code x 1 0 . 5 11 + 1 10 1 x 2 0 . 25 1 - 1 1 01 x 3 0 . 125 + 3 001 x 4 0 . 125 - 3 000

This Huffman code was derived in [link] . For a length m input sequence, the second step replaces each symbol value with the appropriatebinary sequence, and places the output in the vector cx . The third step carries out the decoding.Assuming the encoding and decoding have been done properly, then cx is transformed into the output y , which should be the same as the original sequence x . Indeed, running the program codex.m (which contains all three steps) gives a perfect decoding.

Mimicking the code in codex.m , create a Huffman encoder and decoder for the source defined in Example [link] .

Use codex.m to investigate what happens when the probabilities of the source alphabet change.

  1. Modify step 1 of the program so that the elements of the input sequence have probabilities
    x 1 0 . 1 , x 2 0 . 1 , x 3 0 . 1 , and x 4 0 . 7 .
  2. Without changing the Huffman encoding to account for these changed probabilities, compare theaverage length of the coded data vector cx with the average length of the naive encoder [link] . Which does a better job compressing the data?
  3. Modify the program so that the elements of the input sequence all have the same probability, and answer thesame question.
  4. Build the Huffman chart for the probabilities defined in [link] .
  5. Implement this new Huffman code and compare the average length of the coded data cx to the previous results. Which does a better job compressing the data?

Using codex.m , implement the Huffman code from [link] . What is the length of the resulting data when applied to thetext of the Wizard of Oz ? What rate of data compression has been achieved?

Source coding is used to reduce the redundancy in the original data. If the letters in the Wizard of Oz were independent, then the Huffman coding in [link] would be optimal: no other coding method could achieve a better compression ratio. But the letters are not independent.More sophisticated schemes would consider not just the raw probabilities of the letters, but the probabilities of pairs ofletters, or of triplets, or more. As suggested by the redundancy studies in "Redundancy" , there is a lot that can be gained by exploiting higher order relationships betweenthe symbols.

“Zipped” files (usually with a .zip extension) are a popular form of data compression for text (and other data) onthe web. Download a handful of .zip files. Note the file size when the data is in its compressed formand the file size after decompressing (“unzipping”) the file. How does this compare with the compression ratio achieved in [link] ?

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Software receiver design. OpenStax CNX. Aug 13, 2013 Download for free at http://cnx.org/content/col11510/1.3
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Software receiver design' conversation and receive update notifications?

Ask