<< Chapter < Page | Chapter >> Page > |
Let us assume that we represent a symbol ${x}_{n}$ , with probability ${p}_{n}$ , by ${l}_{n}$ bits. Then, the average number of bits spent per symbol will be
A four-symbol alphabet produces stochastically independent outcomes with the following probabilities. $$({x}_{1})=\frac{1}{2}$$ $$({x}_{2})=\frac{1}{4}$$ $$({x}_{3})=\frac{1}{8}$$ $$({x}_{4})=\frac{1}{8}$$ and an entropy of 1.75 bits/symbol. Let's see if we can find a codebook for this four-letter alphabet that satisfies the Source CodingTheorem. The simplest code to try is known as the simple binary code : convert the symbol's index into a binary number and use the same number of bits for each symbol byincluding leading zeros where necessary.
The simple binary code is, in this case, less efficient than theunequal-length code. Using the efficient code, we can transmit the symbolic-valued signal having this alphabet 12.5%faster. Furthermore, we know that no more efficient codebook can be found because of Shannon's source coding theorem.
Let us return to the ASCII codes presented in . Is the 7-bit ASCII code optimal, i.e., is it a minimal representation? The 7-bit ASCII code assign an equal length (7-bit) to all characters it represents. Thus, it would be optimal if all of the 128 characters wereequiprobable, that is each character should have a probability of $\frac{1}{128}$ . To find out whether the characters really are equiprobable an analysis of all English texts would be needed. Such an analysis is difficult to do. However, the letter "E" is more probable than the letter "Z", so the equiprobable assumption does not hold, and the ASCII codeis not optimal.
(A technical note: We should take into account that in English text subsequent outcomes are not stochastically independent. To see this, assume the first letter to be "b", then it is more probable that the next letter is "e", than "z". In the case where the outcomesare not stochastically independent, the formulation we have given of Shannon's source coding theorem is no longer valid, to fix this, we should replace the entropy with the entropy rate, but we will not pursue this here).
From Shannon's source coding theorem we know what the minimum average rate needed to represent a source is. But other than in the case when the logarithm of the probabilities gives an integer, we do not get any indications on how to obtain that rate. It is a large area of research to getclose to the Shannon entropy bound. One clever way to do encoding is the Huffman coding scheme.
Notification Switch
Would you like to follow the 'Information and signal theory' conversation and receive update notifications?