<< Chapter < Page Chapter >> Page >

Even given the finite return on effort suggested by Amdahl’s Law, tuning a program with a sharp profile can be rewarding. Programs with flat profiles are much more difficult to tune. These are often system codes, nonnumeric applications, and varieties of numerical codes without matrix solutions. It takes a global tuning approach to reduce, to any justifiable degree, the runtime of a program with a flat profile. For instance, you can sometimes optimize instruction cache usage, which is complicated because of the program’s equal distribution of activity among a large number of routines. It can also help to reduce subroutine call overhead by folding callees into callers. Occasionally, you can find a memory reference problem that is endemic to the whole program — and one that can be fixed all at once.

When you look at a profile, you might find an unusually large percentage of time spent in the library routines such as log , exp , or sin . Often these functions are done in software routines rather than inline. You may be able to rewrite your code to eliminate some of these operations. Another important pattern to look for is when a routine takes far longer than you expect. Unexpected execution time may indicate you are accessing memory in a pattern that is bad for performance or that some aspect of the code cannot be optimized properly.

In any case, to get a profile, you need a profiler. One or two subroutine profilers come standard with the software development environments on all UNIX machines. We discuss two of them: prof and gprof . In addition, we mention a few line-by-line profilers. Subroutine profilers can give you a general overall view of where time is being spent. You probably should start with prof , if you have it (most machines do). Otherwise, use gprof . After that, you can move to a line-by- line profiler if you need to know which statements take the most time.

Prof

prof is the most common of the UNIX profiling tools. In a sense, it is an extension of the compiler, linker, and object libraries, plus a few extra utilities, so it is hard to look at any one thing and say “this profiles your code.” prof works by periodically sampling the program counter as your application runs. To enable profiling, you must recompile and relink using the –p flag. For example, if your program has two modules, stuff.c and junk.c , you need to compile and link according to the following code:


% cc stuff.c -p -O -c % cc junk.c -p -O -c% cc stuff.o junk.o -p -o stuff

This creates a stuff binary that is ready for profiling. You don’t need to do anything special to run it. Just treat it normally by entering stuff . Because runtime statistics are being gathered, it takes a little longer than usual to execute. Remember: code with profiling enabled takes longer to run. You should recompile and relink the whole thing without the –p flag when you have finished profiling. At completion, there is a new file called mon.out in the directory where you ran it. This file contains the history of stuff in binary form, so you can’t look at it directly. Use the prof utility to read mon.out and create a profile of stuff . By default, the information is written to your screen on standard output, though you can easily redirect it to a file:

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, High performance computing. OpenStax CNX. Aug 25, 2010 Download for free at http://cnx.org/content/col11136/1.5
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'High performance computing' conversation and receive update notifications?

Ask