<< Chapter < Page Chapter >> Page >

In general, the HPF compiler is not magic - it simply does a very good job with the communication details when the programmer can design a good data decomposition. At the same time, it retains portability with the single CPU and shared uniform memory systems using FORTRAN 90.

Hpf data layout directives

Perhaps the most important contributions of HPF are its data layout directives. Using these directives, the programmer can control how data is laid out based on the programmer's knowledge of the data interactions. An example directive is as follows:


REAL*4 ROD(10) !HPF$ DISTRIBUTE ROD(BLOCK)

The !HPF$ prefix would be a comment to a non-HPF compiler and can safely be ignored by a straight FORTRAN 90 compiler. The DISTRIBUTE directive indicates that the ROD array is to be distributed across multiple processors. If this directive is not used, the ROD array is allocated on one processor and communicated to the other processors as necessary. There are several distributions that can be done in each dimension:


REAL*4 BOB(100,100,100),RICH(100,100,100) !HPF$ DISTRIBUTE BOB(BLOCK,CYCLIC,*)!HPF$ DISTRIBUTE RICH(CYCLIC(10))

These distributions operate as follows:

  • BLOCK The array is distributed across the processors using contiguous blocks of the index value. The blocks are made as large as possible.
  • CYCLIC The array is distributed across the processors, mapping each successive element to the "next" processor, and when the last processor is reached, allocation starts again on the first processor.
  • CYCLIC(n) The array is distributed the same as CYCLIC except that n successive elements are placed on each processor before moving on to the next processor.
All the elements in that dimension are placed on the same processor. This is most useful for multidimensional arrays.

Distributing array elements to processors

This figure shows three grids of numbered boxes, with three lines of code above each grid.

[link] shows how the elements of a simple array would be mapped onto three processors with different directives.

It must allocate four elements to Processors 1 and 2 because there is no Processor 4 available for the leftover element if it allocated three elements to Processors 1 and 2. In [link] , the elements are allocated on successive processors, wrapping around to Processor 1 after the last processor. In [link] , using a chunk size with CYCLIC is a compromise between pure BLOCK and pure CYCLIC .

To explore the use of the * , we can look at a simple two-dimensional array mapped onto four processors. In [link] , we show the array layout and each cell indicates which processor will hold the data for that cell in the two-dimensional array. In [link] , the directive decomposes in both dimensions simultaneously. This approach results in roughly square patches in the array. However, this may not be the best approach. In the following example, we use the * to indicate that we want all the elements of a particular column to be allocated on the same processor. So, the column values equally distribute the columns across the processors. Then, all the rows in each column follow where the column has been placed. This allows unit stride for the on-processor portions of the computation and is beneficial in some applications. The * syntax is also called on-processor distribution.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, High performance computing. OpenStax CNX. Aug 25, 2010 Download for free at http://cnx.org/content/col11136/1.5
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'High performance computing' conversation and receive update notifications?

Ask