Assembly language (Page 6/8)

High performance computing Page 6 / 8

But one bright spot is the branch delay slot. For the first iteration, the load was done before the loop started. For the successive iterations, the first load was done in the branch delay slot at the bottom of the loop.

Comparing this code to the moderate optimization code on the MC68020, you can begin to get a sense of why RISC was not an overnight sensation. It turned out that an unsophisticated compiler could generate much tighter code for a CISC processor than a RISC processor. RISC processors are always executing extra instructions here and there to compensate for the lack of slick features in their instruction set. If a processor has a faster clock rate but has to execute more instructions, it does not always have better performance than a slower, more efficient processor.

But as we shall soon see, this CISC advantage is about to evaporate in this particular example.

Higher optimization

We now increase the optimization to -O2 . Now the compiler generates much better code. It’s important you remember that this is the same compiler being used for all three examples.

At this optimization level, the compiler looked through the code sufficiently well to know it didn’t even need to rotate the register windows (no save instruction). Clearly the compiler looked at the register usage of the entire routine:



! Note, didn’t even rotate the register Window
      ! We just use the %o registers from the caller! %o0 = Address of first element of A (from calling convention)! %o1 = Address of first element of B (from calling convention)
      ! %o2 = Address of first element of C (from calling convention)! %o3 = Address of N (from calling convention)addem_:
             ld      [%o3],%g2                 ! Load N
             cmp     %g2,1                     ! Check to see if it is<1 
             bl      .L77000006                ! Check for zero trip loopor      %g0,1,%g1                 ! Delay slot - Set I to 1 
       .L77000003:ld      [%o1],%f0                 ! Load B(I) First time Only.L900000109:
             ld      [%o2],%f1                 ! Load C(I)
             fadds   %f0,%f1,%f0               ! Addadd     %g1,1,%g1                 ! Increment I
             add     %o1,4,%o1                 ! Increment Address of Badd     %o2,4,%o2                 ! Increment Address of C
             cmp     %g1,%g2                   ! Check Loop Terminationst      %f0,[%o0]                 ! Store A(I)add     %o0,4,%o0                 ! Increment Address of A
             ble,a   .L900000109               ! Branch w/ annulld      [%o1],%f0                 ! Load the B(I).L77000006:
             retl                              ! Leaf Return (No window)nop                               ! Branch Delay Slot

This is tight code. The registers o0 , o1 , and o2 contain the addresses of the first elements of A , B , and C respectively. They already point to the right value for the first iteration of the loop. The value for I is never stored in memory; it is kept in global register g1 . Instead of multiplying I by 4, we simply advance the three addresses by 4 bytes each iteration.

<< Chapter < Page Page > Chapter >>

Read also:

Get Jobilize Job Search Mobile App in your pocket Now!

100% Free Mobile Applications
Receive real-time job alerts and never miss the right job again

Source: OpenStax, High performance computing. OpenStax CNX. Aug 25, 2010 Download for free at http://cnx.org/content/col11136/1.5

Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'High performance computing' conversation and receive update notifications?

Ask

	19 AP 18 Cardiovascular System Blood Essay By OpenStax Start Flashcards
	12 Biology 12 Mendel's Experiments Heredity MCQ By OpenStax Start Quiz
	2 Microeconomics 02 Choice in a World of Scarcity By OpenStax Start Flashcards
©flickr: Gareth	Resume Writing MCQ By Abby Sharp Start Quiz
	19 AP 19 Cardiovascular System Heart Essay By OpenStax Start Flashcards
	Spanish Lesson 4 By Anonymous User Start Quiz
	23 AP Key Terms 23 The Digestive System By OpenStax Start Key Terms
	7 AP 07 Axial Skeleton MCQ By OpenStax Start Quiz
	8 Neuroanatomy 08 The Vestibular System By Stephen Voron Start Quiz
	13 AP 13 Nervous System Essay By OpenStax Start Flashcards