<< Chapter < Page Chapter >> Page >
This module is part of the collection, A First Course in Electrical and Computer Engineering . The LaTeX source files for this collection were created using an optical character recognition technology, and because of this process there may be more errors than usual. Please contact us if you discover any errors.

In 1838, Samuel Morse was struggling with the problem of designing an efficient code for transmitting information over telegraph lines. He reasoned that an efficient code would use short code words for common letters and long code words for uncommon letters. (Can you see the profit motive at work?) In order to turn this reasoned principle into workable practice, Morse rummaged around in the composition trays for typeface in a printshop. He discovered that typesetters use many more e ' s than s's. He then formed a table that showed the relative frequency with which each letter was used. His ingenious, variable-length Morse code assigned short codes to likely letters (like “dot” for e ' ) and long codes to unlikely letters (like “dash dash dot dot” for z ' ). We now know that Morse came within about 15% of the theoretical minimum for the average code word length for English language text.

A Variation on Morse's Experiment. In order to set the stage for our study of efficient source codes, let's run a variation on Morse's experiment to see if we can independently arrive at a way of designing codes. Instead of giving ourselves a composition tray, let's start with a communication source that generates five symbols or letters S 0 , S 1 , S 2 , S 3 , S 4 . We run the source for 100 transmissions and observe the following numbers of transmissions for each symbol:

50 S 0 ' s 20 S 1 ' s 20 S 2 ' s 5 S 3 ' s 5 S 4 ' s .

We will assume that these “source statistics” are typical, meaning that 1000 transmissions would yield 500 S 0 ' s and so on.

The most primitive binary code we could build for our source would use three bits for each symbol:

S 0 000 S 1 001 S 2 010 S 3 011 S 4 100 x 101 x 110 x 111 .

This code is inefficient in two ways. First, it leaves three illegal code words that correspond to no source symbol. Second, it uses the same code word length for an unlikely symbol (like S 4 ) that it uses for a likely symbol (like S 0 ). The first defect we can correct by concatenating consecutive symbols into symbol blocks, or composite symbols. If we form a composite symbol consisting of M source symbols, then a typical composite symbol is S 1 S 0 S 1 S 4 S 2 S 3 S 1 S 2 S 0 . The number of such composite symbols that can be generated is 5 M . The binary code for these 5 M composite symbols must contain N binary digits where

2 N - 1 < 5 M < 2 N ( N M log 2 5 ) .

The number of bits per source symbol is

N M log 2 5 = 2 . 32 .

This scheme improves on the best variable length code of Table 2 from "Binary Codes: From Symbols to Binary Codes" by 0.08 bits/symbol.

Now let's reason, as Morse did, that an efficient code would use short codes for likely symbols and long codes for unlikely symbols. Let's pick code#5 from Table 2 from "Binary Codes: From Symbols to Binary Codes" for this purpose:

S 0 S 1 S 2 S 3 S 4 1 01 001 0001 0000 .

This is a variable-length code. If we use this code on the 100 symbols that generated our source statistic, the average number of bits/symbol is

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, A first course in electrical and computer engineering. OpenStax CNX. Sep 14, 2009 Download for free at http://cnx.org/content/col10685/1.2
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'A first course in electrical and computer engineering' conversation and receive update notifications?

Ask