<< Chapter < Page | Chapter >> Page > |
In the 2's complement fractional representation, an $N$ bit binary word can represent $2^{N}$ equally space numbers from $\frac{-2^{(N-1)}}{2^{(N-1)}}=1$ to $\frac{2^{-(N-1)}}{2^{(N-1)}}=1-2^{(N-1)}$ .
For example, we interpret an 8-bit binary word $${b}_{7}{b}_{6}{b}_{5}{b}_{4}{b}_{3}{b}_{2}{b}_{1}{b}_{0}$$ as a fractional number $$x=\frac{-({b}_{7}2^{7})+{b}_{6}2^{6}++{b}_{1}\times 2+{b}_{0}}{2^{7}}=(-({b}_{7})+\sum_{i=0}^{6} 2^{(i-7)}{b}_{i})\in \left[-1 , 1-2^{-7}\right]$$
This representation is also referred as Q-format . We can think of having an implied binary digit right after the MSB. If we have an $N$ -bit binary word with MSB as the sign bit, we have $N-1$ bits to represent the fraction. We say the number has Q-( $N-1$ ) format. For example, in the example, $x$ is a Q-7 number. In C6211, it is easiest to handle Q-15 numbers represented by each 16bit binary word, because the multiplication of two Q-15 numbers results in a Q-30 number that can still be stored ina 32-bit wide register of C6211. The programmer needs to keep track of the implied binary point when manipulatingQ-format numbers.
(Q format): What are the decimal fractional numbers corresponding to the Q-7 format binary numbers; $01001101$ , $11100100$ , $01111001$ , and $10001011$ ?
Intentionally left blank.
The convenience of 2's compliment format comes from the ability to represent negative numbers and computesubtraction using the same algorithm as a binary addition. The C62x processor has instructions to add, subtract andmultiply numbers in the 2's compliment format. Because, in most digital signal processing algorithms, Q-15 format ismost easy to implement on C62x processors, we only focus on the arithmetic operations on Q-15 numbers in the following.
The addition of two binary numbers is computed in the
same way as we compute the sum of two decimal numbers.Using the relation
$0+0=0$ ,
$0+1=1+0=1$ and
$1+1=10$ , we can easily compute the sum of two binary
numbers. The C62x instruction
ADD
performs this binary addition on different operands.
However, care must be taken when adding binary numbers. Because each Q-15 number can represent numbers in therange $\left[-1 , 1-2^{15}\right]$ , if the result of summing two Q-15 numbers is not in this range, we cannot represent the result in theQ-15 format. When this happens, we say an overflow has occurred. Unless carefully handled, the overflow makes the result incorrect.Therefore, it is really important to prevent overflows from occurring when implementing DSP algorithms. One wayof avoiding overflow is to scale all the numbers down by a constant factor, effectively making all the numbers verysmall, so that any summation would give results in the $\left[-1 , 1\right)$ range. This scaling is necessary and it is important to figure out how muchscaling is necessary to avoid overflow. Because scaling results in loss of effective number of digits, increasingquantization errors, we usually need to find the minimum amount of scaling to prevent overflow.
Another way of handling the overflow (and underflow) is
saturation . If the result is out of the
range that can be properly represented in the given datasize, the value is saturated, meaning that the value
closest to the true result is taken in the rangerepresentable. Such instructions as
SADD
,
SSUB
perform the operations followed by saturation.
Notification Switch
Would you like to follow the 'Finite impulse response' conversation and receive update notifications?