<< Chapter < Page Chapter >> Page >

Another example of the effect of loop reordering is a style of plan that we sometimes call vector recursion (unrelated to “vector-radix” FFTs [link] ). The basic idea is that, if one has a loop (vector-rank 1) of transforms, where the vectorstride is smaller than the transform size, it is advantageous to push the loop towards the leaves of the transform decomposition, whileotherwise maintaining recursive depth-first ordering, rather than looping “outside” the transform; i.e., apply the usual FFT to“vectors” rather than numbers. Limited forms of this idea have appeared for computing multiple FFTs on vector processors (wherethe loop in question maps directly to a hardware vector) [link] . For example, Cooley-Tukey produces a unit input -stride vector loop at the top-level DIT decomposition, but with a large output stride; this difference in strides makes it non-obvious whether vector recursion isadvantageous for the sub-problem, but for large transforms we often observe the planner to choose this possibility.

In-place 1d transforms (with no separate bit reversal pass) can be obtained as follows by a combination DIT and DIF plans "Cooley-Tukey plans" with transposes "Rank-0 plans" . First, the transform is decomposed via a radix- p DIT plan into a vector of p transforms of size q m , then these are decomposed in turn by a radix- q DIF plan into a vector (rank 2) of p × q transforms of size m . These transforms of size m have input and output at different places/strides in the original array, and so cannot be solvedindependently. Instead, an indirect plan "Indirect plans" is used to express the sub-problem as p q in-place transforms of size m , followed or preceded by an m × p × q rank-0 transform. The latter sub-problem is easily seen to be m in-place p × q transposes (ideally square, i.e. p = q ). Related strategies for in-place transforms based on small transposes weredescribed in [link] , [link] , [link] , [link] ; alternating DIT/DIF, without concern for in-place operation, wasalso considered in [link] , [link] .

The fftw planner

Given a problem and a set of possible plans, the basic principle behind the FFTW planner is straightforward: construct a plan foreach applicable algorithmic step, time the execution of these plans, and select the fastest one. Each algorithmic step may break theproblem into subproblems, and the fastest plan for each subproblem is constructed in the same way. These timing measurements can either beperformed at runtime, or alternatively the plans for a given set of sizes can be precomputed and loaded at a later time.

A direct implementation of this approach, however, faces an exponential explosion of the number of possible plans, and hence ofthe planning time, as n increases. In order to reduce the planning time to a manageable level, we employ several heuristics to reduce thespace of possible plans that must be compared. The most important of these heuristics is dynamic programming [link] : it optimizes each sub-problem locally, independently of the larger context (so that the “best” plan for agiven sub-problem is re-used whenever that sub-problem is encountered). Dynamic programming is not guaranteed to find thefastest plan, because the performance of plans is context-dependent on real machines (e.g., the contents of the cache depend on the precedingcomputations); however, this approximation works reasonably well in practice and greatly reduces the planning time. Other approximations,such as restrictions on the types of loop-reorderings that are considered "Plans for higher vector ranks" , are described in [link] .

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Fast fourier transforms. OpenStax CNX. Nov 18, 2012 Download for free at http://cnx.org/content/col10550/1.22
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Fast fourier transforms' conversation and receive update notifications?

Ask