<< Chapter < Page Chapter >> Page >

[link] contains references to several types, functions and macros that use upper-case identifiers – these are primitive functions or types that have been predefined as inline functions or macros. A benefit of using primitives in this way is that the details specific to numerical representation and the underlying machine have been abstracted away; thus, the same function can be compiled for a variety of types and machines by simply including a different header file with different primitives. [link] , for example, could be compiled for double-precision arithmetic on an SSE2 machine by including sse_double.h , or it could be compiled with much slower scalar arithmetic by including scalar.h . The same code can even be used, without modification, to compute forward and backwards transforms, by using C preprocessor directives to conditionally alter the macros.

In order to accommodate mixed numerical representations, the signature of the outermost function references data with void pointers. In the case of the double-precision example in [link] , SFFT_D would be defined to be double in the appropriate header file, and the void pointers are then cast to SFFT_D pointers.

The size-8 transform in [link] uses 8 registers, and thus a declaration of 8 registers of type SFFT_R has been emitted at line 4 in [link] . In the case of double-precision arithmetic on a SSE2 machine, SFFT_R is defined as __m128d in sse_double.h .

The first two rows of [link] correspond to lines 6 and 7 of [link] , respectively. The L_4 primitive is used to compute the size-4 leaf node in the first row of the table. The second row is a load/leaf node of size 2(x2), indicating two size-2 nodes in parallel, which is computed with the L_2 primitive. The input addresses in the table are the addresses of complex words, while the addresses in the generated code refer to the real and imaginary parts of a complex word, and thus the addresses from [link] are multiplied by a factor of 2 to obtain the addresses in [link] .

The final two CNodeBfly nodes of [link] correspond to the K_0 and K_N sub-transform (a.k.a. butterfly) primitives at lines 8 and 10, respectively. Because the node in the third row of [link] has a twiddle factor of ω 8 0 (i.e., unity), the computation requires no multiplication, and the K_0 primitive is used for this special case. The K_N primitive at line 10 does require a twiddle factor, which is passed to K_N as two vector literals that represent the twiddle factor in unpacked form. For the purposes of brevity, the precision has been truncated to only a few decimal places. Fast interleaved complex multiplication describes how interleaved complex multiplication is faster if one operand is pre-unpacked.

After each node is processed, the registers that have been used by it are checked in a map ( rmap ) that maps each register to the last node to have used that register. If the current node is the last node to have used a register, the register is stored to memory. In the case of the transform in [link] , four registers are stored with an instance of the S_4 primitive at lines 9 and 11. In contrast to the load operations at the leaves, which are decimated-in-time and thus effectively pseudo-random memory accesses, the store operations are to linear regions of memory, the addresses of which can be determined from each register's integer identifier. The store address offset for data in register R i is simply i × 2 × V L .

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Computing the fast fourier transform on simd microprocessors. OpenStax CNX. Jul 15, 2012 Download for free at http://cnx.org/content/col11438/1.2
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Computing the fast fourier transform on simd microprocessors' conversation and receive update notifications?

Ask