<< Chapter < Page Chapter >> Page >

So: -(-128) = -128.

Because 128 is out of the range of signed 8bits numbers.

3.2 addition and subtraction:

Addition and Subtraction is done using following steps:

  • Normal binary addition
  • Monitor sign bit for overflow
  • Take twos compliment of subtrahend and add to minuend ,i.e. a - b = a + (-b)

Hardware for addition and subtraction:

3.3 multiplying positive numbers:

The multiplying is done using following steps:

  • Work out partial product for each digit
  • Take care with place value (column)
  • Add partial products

Hardware implementation of unsigned binary multiplication:

Execution of example:

Flowchart for unsigned binary multiplication:

3.4 multiplying negative numbers

Solution 1:

  • Convert to positive if required
  • Multiply as above
  • If signs were different, negate answer

Solution 2:

  • Booth’s algorithm:

Example of Booth’s Algorithm:

3.5 division:

  • More complex than multiplication
  • Negative numbers are really bad!
  • Based on long division
  • (for more detail, reference to Computer Organization and Architecture, William Stalling)

4. floating-point representation

4.1 principles

We can represent a real number in the form

± S × B ± E size 12{ +- S times B rSup { size 8{ +- E} } } {}

This number can be stored in a binary word with three fields:

  • Sign: plus or minus
  • Significant: S
  • Exponent: E.

(A fixed value, called the bias, is subtracted from the biased exponent field to get the true exponent value (E). Typically, the bias equal 2 k 1 1 size 12{2 rSup { size 8{k - 1} } - 1} {} , where k is the number of bits in the binary exponent)

  • The base B is implicit and need not be stored because it is the same for all numbers.

4.2 ieee standard for binary floating-point representation

The most important floating-point representation is defined in IEEE Standard 754 [EEE8]. This standard was developed to facilitate the portability of programs from one processor to another and to encourage the development of sophisticated, numerically oriented programs. The standard has been widely adopted and is used on virtually all contemporary processors and arithmetic coprocessors.

The IEEE standard defines both a 32-bit (Single-precision) and a 64-bit (Double-precision) double format with 8-bit and 11-bit exponents, respectively. Binary floating-point numbers are stored in a form where the MSB is the sign bit, exponent is the biased exponent, and "fraction" is the significand. The implied base (B) is 2.

Not all bit patterns in the IEEE formats are interpreted in die usual way; instead, some bit patterns are used to represent special values. Three special cases arise:

  1. if exponent is 0 and fraction is 0, the number is ±0 (depending on the sign bit)
  2. if exponent = 2 e size 12{2 rSup { size 8{e} } } {} -1 and fraction is 0, the number is ±infinity (again depending on the sign bit), and
  3. if exponent = 2 e size 12{2 rSup { size 8{e} } } {} -1 and fraction is not 0, the number being represented is not a number (NaN).

This can be summarized as:

Single-precision 32 bit

A single-precision binary floating-point number is stored in 32 bits.

The number has value v:

v = s × 2 e size 12{2 rSup { size 8{e} } } {} × m


s = +1 (positive numbers) when the sign bit is 0

s = −1 (negative numbers) when the sign bit is 1

e = Exp − 127 (in other words the exponent is stored with 127 added to it, also called "biased with 127")

m = 1.fraction in binary (that is, the significand is the binary number 1 followed by the radix point followed by the binary bits of the fraction). Therefore, 1 ≤ m<2.

In the example shown above:


E= 011111100(2) -127 = -3

M=1.01 (in binary, which is 1.25 in decimal).

The represented number is: +1.25 × 2−3 = +0.15625.

5. floating-point arithmetic

The basic operations for floating-point X1 = M1 R E1 size 12{X1=M1*R rSup { size 8{E1} } } {} and X2 = M2 R E2 size 12{X2=M2*R rSup { size 8{E2} } } {}

  • X1 ± X2 = ( M1 R E1 E2 ) R E2 size 12{X1 +- X2= \( M1*R rSup { size 8{E1 - E2} } \) R rSup { size 8{E2} } } {} (assume E1 size 12{<= {}} {} E2)
  • X1 X2 = ( M1 M2 ) R E1 + E2 size 12{X1*X2= \( M1*M2 \) R rSup { size 8{E1+E2} } } {}
  • X1 / X2 = ( M1 / M2 ) R E1 E2 size 12{X1/X2= \( M1/M2 \) R rSup { size 8{E1 - E2} } } {}

For addi­tion and subtraction, it is necessary lo ensure that both operands have the same exponent value. I his may require shifting the radix point on one of the operands to achieve alignment. Multiplication and division are more straightforward.

A floating-point operation may produce one of these conditions:

  • Exponent overflow: A positive exponent exceeds the maximum possible expo­nent value. In some systems, this may be designated as
  • Exponent underflow: A negative exponent is less than the minimum possible exponent value (e.g.. -200 is less than -127). This means that the number is too small to be represented, and it may be reported as 0.
  • Significand underflow: In the process of aligning significands, digits may flow off the right end of the significand. Some form of rounding is required.
  • Significand overflow: The addition of two significands of the same sign may result in a carry out of the most significant bit. This can be fixed by realign­ment.

Get the best College algebra course in your pocket!

Source:  OpenStax, Computer architecture. OpenStax CNX. Jul 29, 2009 Download for free at http://cnx.org/content/col10761/1.1
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Computer architecture' conversation and receive update notifications?