<< Chapter < Page Chapter >> Page >

Assuming a large value for N, the previous loop was an ideal candidate for loop unrolling. The iterations could be executed in any order, and the loop innards were small. But as you might suspect, this isn’t always the case; some kinds of loops can’t be unrolled so easily. Additionally, the way a loop is used when the program runs can disqualify it for loop unrolling, even if it looks promising.

In this section we are going to discuss a few categories of loops that are generally not prime candidates for unrolling, and give you some ideas of what you can do about them. We talked about several of these in the previous chapter as well, but they are also relevant here.

Loops with low trip counts

To be effective, loop unrolling requires a fairly large number of iterations in the original loop. To understand why, picture what happens if the total iteration count is low, perhaps less than 10, or even less than 4. With a trip count this low, the preconditioning loop is doing a proportionately large amount of the work. It’s not supposed to be that way. The preconditioning loop is supposed to catch the few leftover iterations missed by the unrolled, main loop. However, when the trip count is low, you make one or two passes through the unrolled loop, plus one or two passes through the preconditioning loop. In other words, you have more clutter; the loop shouldn’t have been unrolled in the first place.

Probably the only time it makes sense to unroll a loop with a low trip count is when the number of iterations is constant and known at compile time. For instance, suppose you had the following loop:


Because NITER is hardwired to 3, you can safely unroll to a depth of 3 without worrying about a preconditioning loop. In fact, you can throw out the loop structure altogether and leave just the unrolled loop innards:

PARAMETER (NITER = 3) A(1) = B(1) * CA(2) = B(2) * C A(3) = A(3) * C

Of course, if a loop’s trip count is low, it probably won’t contribute significantly to the overall runtime, unless you find such a loop at the center of a larger loop. Then you either want to unroll it completely or leave it alone.

Fat loops

Loop unrolling helps performance because it fattens up a loop with more calculations per iteration. By the same token, if a particular loop is already fat, unrolling isn’t going to help. The loop overhead is already spread over a fair number of instructions. In fact, unrolling a fat loop may even slow your program down because it increases the size of the text segment, placing an added burden on the memory system (we’ll explain this in greater detail shortly). A good rule of thumb is to look elsewhere for performance when the loop innards exceed three or four statements.

Loops containing procedure calls

As with fat loops, loops containing subroutine or function calls generally aren’t good candidates for unrolling. There are several reasons. First, they often contain a fair number of instructions already. And if the subroutine being called is fat, it makes the loop that calls it fat as well. The size of the loop may not be apparent when you look at the loop; the function call can conceal many more instructions.

Second, when the calling routine and the subroutine are compiled separately, it’s impossible for the compiler to intermix instructions. A loop that is unrolled into a series of function calls behaves much like the original loop, before unrolling.

Last, function call overhead is expensive. Registers have to be saved; argument lists have to be prepared. The time spent calling and returning from a subroutine can be much greater than that of the loop overhead. Unrolling to amortize the cost of the loop structure over several calls doesn’t buy you enough to be worth the effort.

The general rule when dealing with procedures is to first try to eliminate them in the “remove clutter” phase, and when this has been done, check to see if unrolling gives an additional performance improvement.

Loops with branches in them

In [link] we showed you how to eliminate certain types of branches, but of course, we couldn’t get rid of them all. In cases of iteration-independent branches, there might be some benefit to loop unrolling. The IF test becomes part of the operations that must be counted to determine the value of loop unrolling. Below is a doubly nested loop. The inner loop tests the value of B(J,I) :

DO I=1,N DO J=1,NIF (B(J,I) .GT. 1.0) A(J,I) = A(J,I) + B(J,I) * C ENDDOENDDO

Each iteration is independent of every other, so unrolling it won’t be a problem. We’ll just leave the outer loop undisturbed:

II = IMOD (N,4) DO I=1,NDO J=1,II IF (B(J,I) .GT. 1.0)+ A(J,I) = A(J,I) + B(J,I) * C ENDDODO J=II+1,N,4 IF (B(J,I) .GT. 1.0)+ A(J,I) = A(J,I) + B(J,I) * C IF (B(J+1,I) .GT. 1.0)+ A(J+1,I) = A(J+1,I) + B(J+1,I) * C IF (B(J+2,I) .GT. 1.0)+ A(J+2,I) = A(J+2,I) + B(J+2,I) * C IF (B(J+3,I) .GT. 1.0)+ A(J+3,I) = A(J+3,I) + B(J+3,I) * C ENDDOENDDO

This approach works particularly well if the processor you are using supports conditional execution. As described earlier, conditional execution can replace a branch and an operation with a single conditionally executed assignment. On a superscalar processor with conditional execution, this unrolled loop executes quite nicely.

Questions & Answers

can someone help me with some logarithmic and exponential equations.
Jeffrey Reply
sure. what is your question?
okay, so you have 6 raised to the power of 2. what is that part of your answer
I don't understand what the A with approx sign and the boxed x mean
it think it's written 20/(X-6)^2 so it's 20 divided by X-6 squared
I'm not sure why it wrote it the other way
I got X =-6
ok. so take the square root of both sides, now you have plus or minus the square root of 20= x-6
oops. ignore that.
so you not have an equal sign anywhere in the original equation?
Commplementary angles
Idrissa Reply
im all ears I need to learn
right! what he said ⤴⤴⤴
what is a good calculator for all algebra; would a Casio fx 260 work with all algebra equations? please name the cheapest, thanks.
Kevin Reply
a perfect square v²+2v+_
Dearan Reply
kkk nice
Abdirahman Reply
algebra 2 Inequalities:If equation 2 = 0 it is an open set?
Kim Reply
or infinite solutions?
The answer is neither. The function, 2 = 0 cannot exist. Hence, the function is undefined.
Embra Reply
if |A| not equal to 0 and order of A is n prove that adj (adj A = |A|
Nancy Reply
rolling four fair dice and getting an even number an all four dice
ramon Reply
Kristine 2*2*2=8
Bridget Reply
Differences Between Laspeyres and Paasche Indices
Emedobi Reply
No. 7x -4y is simplified from 4x + (3y + 3x) -7y
Mary Reply
is it 3×y ?
Joan Reply
J, combine like terms 7x-4y
Bridget Reply
im not good at math so would this help me
Rachael Reply
I'm not good at math so would you help me
what is the problem that i will help you to self with?
how do you translate this in Algebraic Expressions
linda Reply
Need to simplify the expresin. 3/7 (x+y)-1/7 (x-1)=
Crystal Reply
. After 3 months on a diet, Lisa had lost 12% of her original weight. She lost 21 pounds. What was Lisa's original weight?
Chris Reply
what's the easiest and fastest way to the synthesize AgNP?
Damian Reply
types of nano material
abeetha Reply
I start with an easy one. carbon nanotubes woven into a long filament like a string
many many of nanotubes
what is the k.e before it land
what is the function of carbon nanotubes?
what is nanomaterials​ and their applications of sensors.
Ramkumar Reply
what is nano technology
Sravani Reply
what is system testing?
preparation of nanomaterial
Victor Reply
Yes, Nanotechnology has a very fast field of applications and their is always something new to do with it...
Himanshu Reply
good afternoon madam
what is system testing
what is the application of nanotechnology?
In this morden time nanotechnology used in many field . 1-Electronics-manufacturad IC ,RAM,MRAM,solar panel etc 2-Helth and Medical-Nanomedicine,Drug Dilivery for cancer treatment etc 3- Atomobile -MEMS, Coating on car etc. and may other field for details you can check at Google
anybody can imagine what will be happen after 100 years from now in nano tech world
after 100 year this will be not nanotechnology maybe this technology name will be change . maybe aftet 100 year . we work on electron lable practically about its properties and behaviour by the different instruments
name doesn't matter , whatever it will be change... I'm taking about effect on circumstances of the microscopic world
how hard could it be to apply nanotechnology against viral infections such HIV or Ebola?
silver nanoparticles could handle the job?
not now but maybe in future only AgNP maybe any other nanomaterials
can nanotechnology change the direction of the face of the world
Prasenjit Reply
At high concentrations (>0.01 M), the relation between absorptivity coefficient and absorbance is no longer linear. This is due to the electrostatic interactions between the quantum dots in close proximity. If the concentration of the solution is high, another effect that is seen is the scattering of light from the large number of quantum dots. This assumption only works at low concentrations of the analyte. Presence of stray light.
Ali Reply
the Beer law works very well for dilute solutions but fails for very high concentrations. why?
bamidele Reply
Got questions? Join the online conversation and get instant answers!
QuizOver.com Reply

Get the best Algebra and trigonometry course in your pocket!

Source:  OpenStax, High performance computing. OpenStax CNX. Aug 25, 2010 Download for free at http://cnx.org/content/col11136/1.5
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'High performance computing' conversation and receive update notifications?