0.4 Streaming fft (Page 9/19)

Page 9 / 19

Example

[link] is a VL-1 size-64 hard-coded four-step FFT. Before it can be used, an initialization procedure (not shown) allocates and populates the LUT at line 1 with the twiddle factors that are required for the step 2 multiplications. Line 44 shows the main function that executes the first sub-transform on the first column (line 49), and the second sub-transform on all remaining columns (line 50). Finally, the sub-transforms corresponding to step 4 of the four-step algorithm are executed on all columns in line 51.

The twiddle factor multiplication that corresponds to step 2 of the four-step algorithm takes place in lines 21-23 and lines 26-29. The first register is not multiplied with a twiddle factor because the first row of twiddle factors are $ω_{N}^{0}$ (i.e., unity). The other registers are multiplied with two registers loaded from the LUT, which are the unpacked real and imaginary parts (see Fast interleaved complex multiplication for details about unpacked complex multiplication).

Other vector lengths

const SFFT_D __attribute__ ((aligned(32))) *LUT;
const SFFT_D *pLUT;void sfft_fcf64_fs_x1(sfft_plan_t *p, const void *vin, void *vout) {
  const SFFT_D *in = vin;  SFFT_D *out = vout;
  SFFT_R r0,r1,r2,r3,r4,r5,r6,r7;  L_4(in+0,in+64,in+32,in+96,&r0,&r1,&r2,&r3);
  L_2(in+16,in+80,in+112,in+48,&r4,&r5,&r6,&r7);
  K_0(&r0,&r2,&r4,&r6);
  K_N(VLIT4(0.7071,0.7071,0.7071,0.7071),      VLIT4(0.7071,-0.7071,0.7071,-0.7071),&r1,&r3,&r5,&r7);
  r1 = MUL(r1,LOAD(pLUT+0),LOAD(pLUT+4));  TX2(r0,r1);
  r2 = MUL(r2,LOAD(pLUT+8),LOAD(pLUT+12));  r3 = MUL(r3,LOAD(pLUT+16),LOAD(pLUT+20));
  TX2(r2,r3);  r4 = MUL(r4,LOAD(pLUT+24),LOAD(pLUT+28));
  r5 = MUL(r5,LOAD(pLUT+32),LOAD(pLUT+36));  TX2(r4,r5);
  r6 = MUL(r6,LOAD(pLUT+40),LOAD(pLUT+44));  r7 = MUL(r7,LOAD(pLUT+48),LOAD(pLUT+52));
  TX2(r6,r7);  S_4(r0,r2,r4,r6,out+0,out+4,out+8,out+12);
  S_4(r1,r3,r5,r7,out+16,out+20,out+24,out+28);  pLUT += 56;
}void sfft_fcf64_fs_x2(sfft_plan_t *p, const void *vin, void *vout) {
  const SFFT_D *in = vin;  SFFT_D *out = vout;
  SFFT_R r0,r1,r2,r3,r4,r5,r6,r7;  L_4(in+0,in+64,in+32,in+96,&r0,&r1,&r2,&r3);
  L_2(in+16,in+80,in+112,in+48,&r4,&r5,&r6,&r7);
  K_0(&r0,&r2,&r4,&r6);
  K_N(VLIT4(0.7071,0.7071,0.7071,0.7071),      VLIT4(0.7071,-0.7071,0.7071,-0.7071),&r1,&r3,&r5,&r7);
  S_4(r0,r2,r4,r6,out+0,out+32,out+64,out+96);  S_4(r1,r3,r5,r7,out+16,out+48,out+80,out+112);
}void sfft_fcf64_fs(sfft_plan_t *p, const void *vin, void *vout) {
  const SFFT_D *in = vin;  SFFT_D *out = vout;
  pLUT =  LUT;  int i;
  for(i=0;i<4;i++) sfft_fcf64_fs_x1(p, in+(i*4), out+(i*32));
  for(i=0;i<4;i++) sfft_fcf64_fs_x2(p, out+(i*4), out+(i*4));
}
Hard-coded four-step VL-2 size-64 FFT

For $V L > 1$ , the FFTs along the columns are computed in parallel. Thus, in step 1, $n_{1} / V L$ FFTs are computed along the columns of the array with stride $= 2 \times V L$ , and in step 4, $n_{2} / V L$ FFTs are computed along the columns with stride $= 2 \times V L$ .

An implication of computing the first column in parallel with other columns is that the first column is now multiplied by unity twiddle factors, and thus only two sub-transforms are used instead of three.

<< Chapter < Page Page > Chapter >>

Read also:

Get Jobilize Job Search Mobile App in your pocket Now!

100% Free Mobile Applications
Receive real-time job alerts and never miss the right job again

Source: OpenStax, Computing the fast fourier transform on simd microprocessors. OpenStax CNX. Jul 15, 2012 Download for free at http://cnx.org/content/col11438/1.2

Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Computing the fast fourier transform on simd microprocessors' conversation and receive update notifications?

Ask

©flickr: Miguel	Protozoal and Parasitic Infections By Cath Yu Start Quiz
	English Proficiency Test By Anindyo Mukhopadhyay Start Quiz
	Biology 302 Final By Dravida Mahadeo-J... Start Quiz
©flickr: Gisela	Electrocardiogram Quiz By Anonymous User Start Quiz
	3 Psychology MCQ 2009 2 Exam By John Gabrieli Start Exam
	SCEA for Java EE Study Guide By Edward Biton Start Quiz
	1 Endocrine System MCQ By Nick Swain Start Quiz
	College physics for ap® courses By OpenStax Read Online Course
	Anthropology Biology Culture By Richley Crapo Start Assignment
©flickr: U.S.	Molecular Cellular Biology By Ann Schlosser Start Quiz