<< Chapter < Page Chapter >> Page >

The built-in csh time function

This figure contains the caption, % time foo, at the top, followed by a string of text in a horizontal line, with labels below each element in the string. From left to right, the first element in the string reads 14.9u, and is labeled, seconds of user time devoted to process. The second element reads 1.4s, and is labeled, seconds of system time devoted to process. The third element reads 0:19, and is labeled, elapsed time. The fourth element reads, 83%, and is labeled, percent utilization. The fifth element is the number 4, and is labeled, average amount of shared memory in kb. In between the fifth and sixth element is a plus sign. The sixth element reads 1060k, and is labeled, Average amount of unshared data space in KB. The seventh element is the number 27, and is labeled, number of block input operations. In between the seventh and eighth element is a plus sign. The eighth element reads 86io, and is labeled, number of block output operations. The ninth element reads 47pf, and is labeled, page faults. In between the ninth and tenth elements is a plus sign. The tenth element reads 0w, and is labeled, number of swaps.

The second average memory utilization measurement, unshared-memory space , describes the average real storage dedicated to your program’s data structures as it ran. This storage includes saved local variables and COMMON for FORTRAN , and static and external variables for C. We stress the word “real” here and above because these numbers talk about physical memory usage, taken over time. It may be that you have allocated arrays with 1 trillion elements (virtual space), but if your program only crawls into a corner of that space, your runtime memory requirements will be pretty low.

What the unshared-memory space measurement doesn’t tell you, unfortunately, is your program’s demand for memory at its greediest. An application that requires 100 MB 1/10th of the time and 1 KB the rest of the time appears to need only 10 MB on average — not a revealing picture of the program’s memory requirements.

Blocked i/o operations

The two figures for blocked I/O operations primarily describe disk usage, though tape devices and some other peripherals may also be used with blocked I/O. Character I/O operations, such as terminal input and output, do not appear here. A large number of blocked I/O operations could explain a lower-than-expected CPU utilization.

Page faults and swaps

An unusually high number of page faults or any swaps probably indicates a system choked for memory, which would also explain a longer-than-expected elapsed time. It may be that other programs are competing for the same space. And don’t forget that even under optimal conditions, every program suffers some number of page faults, as explained in [link] . Techniques for minimizing page faults are described in [link] .

Timing a portion of the program

For some benchmarking or tuning efforts, measurements taken on the “outside” of the program tell you everything you need to know. But if you are trying to isolate performance figures for individual loops or portions of the code, you may want to include timing routines on the inside too. The basic technique is simple enough:

  1. Record the time before you start doing X.
  2. Do X.
  3. Record the time at completion of X.
  4. Subtract the start time from the completion time.

If, for instance, X’s primary job is to calculate particle positions, divide by the total time to obtain a number for particle positions/second. You have to be careful though; too many calls to the timing routines, and the observer becomes part of the experiment. The timing routines take time too, and their very presence can increase instruction cache miss or paging. Furthermore, you want X to take a significant amount of time so that the measurements are meaningful. Paying attention to the time between timer calls is really important because the clock used by the timing functions has a limited resolution. An event that occurs within a fraction of a second is hard to measure with any accuracy.

Getting time information

In this section, we discuss methods for getting various timer values during the execution of your program.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, High performance computing. OpenStax CNX. Aug 25, 2010 Download for free at http://cnx.org/content/col11136/1.5
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'High performance computing' conversation and receive update notifications?

Ask