<< Chapter < Page Chapter >> Page >

Alternatively, there is an estimate mode that performs no timing measurements whatsoever, but instead minimizes a heuristic costfunction. This can reduce the planner time by several orders of magnitude, but with a significant penalty observed in plan efficiency;e.g., a penalty of 20% is typical for moderate n 2 13 , whereas a factor of 2–3 can be suffered for large n 2 16 [link] . Coming up with a better heuristic plan is an interesting open research question; one difficulty is that, becauseFFT algorithms depend on factorization, knowing a good plan for n does not immediately help one find a good plan for nearby n .

Generating small fft kernels

The base cases of FFTW's recursive plans are its codelets , and these form a critical component of FFTW's performance. Theyconsist of long blocks of highly optimized, straight-line code, implementing many special cases of the DFT that give the planner alarge space of plans in which to optimize. Not only was it impractical to write numerous codelets by hand, but we also needed torewrite them many times in order to explore different algorithms and optimizations. Thus, we designed a special-purpose “FFTcompiler” called genfft that produces the codelets automatically from an abstract description. genfftis summarized in this section and described in more detail by [link] .

A typical codelet in FFTW computes a DFT of a small, fixed size n (usually, n 64 ), possibly with the input or output multiplied by twiddle factors "Cooley-Tukey plans" . Several other kinds of codelets can be produced by genfft, but we will focus here on this common case.

In principle, all codelets implement some combination of the Cooley-Tukey algorithm from [link] and/or some other DFT algorithm expressed by a similarly compact formula. However, a high-performance implementation of the DFT must address many moreconcerns than [link] alone suggests. For example, [link] contains multiplications by 1 that are more efficient to omit. [link] entails a run-time factorization of n , which can be precomputed if n is known in advance. [link] operates on complex numbers, but breaking the complex-number abstraction into real andimaginary components turns out to expose certain non-obvious optimizations. Additionally, to exploit the long pipelines in currentprocessors, the recursion implicit in [link] should be unrolled and re-ordered to a significant degree. Many further optimizations arepossible if the complex input is known in advance to be purely real (or imaginary). Our design goal for genfftwas to keep the expression of the DFT algorithm independent of suchconcerns. This separation allowed us to experiment with various DFT algorithms and implementation strategies independently andwithout (much) tedious rewriting.

genfft is structured as a compiler whose input consists of the kindand size of the desired codelet, and whose output is C code. genfftoperates in four phases: creation, simplification, scheduling, and unparsing.

In the creation phase, genfft produces a representation ofthe codelet in the form of a directed acyclic graph ( dag ). The dag is produced according to well-known DFT algorithms: Cooley-Tukey [link] , prime-factor [link] , split-radix [link] , [link] , [link] , [link] , [link] , and Rader [link] . Each algorithm is expressed in a straightforward math-like notation, using complex numbers, with noattempt at optimization. Unlike a normal FFT implementation, however, the algorithms here are evaluated symbolically and theresulting symbolic expression is represented as a dag, and in particular it can be viewed as a linear network [link] (in which the edges represent multiplication by constants and the vertices represent additions ofthe incoming edges).

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Fast fourier transforms. OpenStax CNX. Nov 18, 2012 Download for free at http://cnx.org/content/col10550/1.22
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Fast fourier transforms' conversation and receive update notifications?

Ask