<< Chapter < Page Chapter >> Page >

The elaborate function modifies the class member variable ` ns ' at lines 10, 14, 18 and 21, where it appends a new node to the back of the list. After the function returns, the ns list represents a topological ordering of the computation with CNodeLoad and CNodeBfly nodes. The nodes of type CNodeLoad represent leaf computations: these computations load elements from the input array and perform a small amount of leaf computation, leaving the result in a set of registers. The CNodeBfly nodes represent computations in the body of the transform: these use a twiddle factor to perform a butterfly computation on a vector of registers, leaving the result in the same registers.

VL-1 size-8 conjugate-pair transform nodes
Type Size Addresses Registers Twiddle
CNodeLoad 4 {0,4,2,6} {0,1,2,3}
CNodeLoad 2(x2) {1,5,7,3} {4,5,6,7}
CNodeBfly 4 {0,2,4,6} ω 8 0
CNodeBfly 4 {1,3,5,7} ω 8 1

The constructor for a CNodeLoad object computes input array addresses for the load operations using the input array offset ( ioffset ), the input array stride , the size of the node (the nodes instantiated at lines 9 and 17 are size-4, and the node instantiated at line 20 is size-2) and a final parameter that is non-zero if the node is a single node (the nodes instantiated at lines 17 and 20 are single nodes, while the node instantiated at line 9 is composed of two size-2 nodes).

As the newly instantiated CNodeLoad objects are appended to the back of ns at lines 10, 14 and 21, the assign_leaf_registers function assigns registers to the outputs of each instance. Registers are identified with integers beginning at zero, and when each register is created it is assigned an identifier from an auto-incrementing counter ( R c o u n t e r ). This function also maintains a map of registers to node pointers, referred to as rmap , where the node for a given register is the last node to reference that register.

The constructor for a CNodeBfly object uses k and stride to compute a twiddle factor for the new instance of a butterfly computation node. When the new instance of CNodeBfly is appended to the end of ns at line 14, the assign_body_registers function assigns registers R i to a node of size N node with the following logic:

R i = R counter - N + k + i × N 4

where i = 0 , , N node - 1 and R counter is the auto-incrementing register counter. The assign_body_registers functions also updates the map of registers to node pointers by setting rmap [ R i ] to point to the new instance of CNodeBfly .

Emitting code

  void sfft_dcf8_hc(sfft_plan_t *p, const void *vin, void *vout) {     const SFFT_D *in = vin;    SFFT_D *out = vout;     SFFT_R r0,r1,r2,r3,r4,r5,r6,r7;      L_4(in+0,in+8,in+4,in+12,&r0,&r1,&r2,&r3);     L_2(in+2,in+10,in+14,in+6,&r4,&r5,&r6,&r7);     K_0(&r0,&r2,&r4,&r6);     S_4(r0,r2,r4,r6,out+0,out+4,out+8,out+12);    K_N(VLIT2(0.7071,0.7071),VLIT2(0.7071,-0.7071),&r1,&r3,&r5,&r7);     S_4(r1,r3,r5,r7,out+2,out+6,out+10,out+14);  }
Hard-coded VL-1 size-8 FFT

Given a list of nodes, it is a simple process to emit C code that can be compiled to actually compute the transform.

The example in [link] would be emitted from the list of four nodes in [link] . Lines 1–4 are emitted from a function that generates a header, and line 13 is emitted from a function that generates a footer. Lines 6–11 are generated based on the list of nodes.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Computing the fast fourier transform on simd microprocessors. OpenStax CNX. Jul 15, 2012 Download for free at http://cnx.org/content/col11438/1.2
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Computing the fast fourier transform on simd microprocessors' conversation and receive update notifications?

Ask