0.4 Streaming fft (Page 17/19)

Page 17 / 19

Sorted size-16 leaf nodes in size-128 hard-coded leaf FFT, grouped for VL-2

Size	Input array addresses
16	{0, 64, 32, 96, 16, 80, 112, 48, 8, 72, 40, 104, 120, 56, 24, 88}
16	{1, 65, 33, 97, 17, 81, 113, 49, 9, 73, 41, 105, 121, 57, 25, 89}
16	{2, 66, 34, 98, 18, 82, 114, 50, 10, 74, 42, 106, 122, 58, 26, 90}
8(x2)	{3, 67, 35, 99, 19, 83, 115, 51, 123, 59, 27, 91, 11, 75, 107, 43}
8(x2)	{4, 68, 36, 100, 20, 84, 116, 52, 124, 60, 28, 92, 12, 76, 108, 44}
8(x2)	{5, 69, 37, 101, 21, 85, 117, 53, 125, 61, 29, 93, 13, 77, 109, 45}
16	{126, 62, 30, 94, 14, 78, 110, 46, 6, 70, 38, 102, 118, 54, 22, 86}
16	{127, 63, 31, 95, 15, 79, 111, 47, 7, 71, 39, 103, 119, 55, 23, 87}

[link] shows the sorted size-16 leaf sub-transforms for a size-128 transform with the rows divided into VL-2 groups. Because each group of two leaf sub-transforms loads data from adjacent memory locations, the group of sub-transforms can be loaded in parallel with vector memory operations, and all (or some) of the computation done in parallel. The first, third and fourth groups in [link] contain leaf nodes of the same size/type; these are the easiest vector leaf sub-transforms to compute, as described in "Homogeneous leaf sub-transform vectors" . The second group of rows contains leaf sub-transforms of differing size/type, and computing these sub-transforms is covered separately in "Heterogeneous leaf sub-transform vectors" .

Homogeneous leaf sub-transform vectors

void
sfft_fcf128_shl16_8_ee(offset_t *is,const SFFT_D *in,SFFT_D *out){  SFFT_R r0,r1,r2,r3,r4,r5,r6,r7,r8,r9,r10,r11,r12,r13,r14,r15;
  L_4(in+is[0],in+is[1],in+is[2],in+is[3],&r0,&r1,&r2,&r3);
  L_2(in+is[4],in+is[5],in+is[6],in+is[7],&r4,&r5,&r6,&r7);
  K_0(&r0,&r2,&r4,&r6);
  K_N(VLIT4(0.7071,0.7071,0.7071,0.7071),      VLIT4(0.7071,-0.7071,0.7071,-0.7071),
      &r1,&r3,&r5,&r7);
  L_4(in+is[8],in+is[9],in+is[10],in+is[11],&r8,&r9,&r10,&r11);
  L_4(in+is[12],in+is[13],in+is[14],in+is[15],&r12,&r13,&r14,&r15);
  K_0(&r0,&r4,&r8,&r12);
  K_N(VLIT4(0.9239,0.9239,0.9239,0.9239),      VLIT4(0.3827,-0.3827,0.3827,-0.3827),
      &r1,&r5,&r9,&r13);
  TX2(&r0,&r1); TX2(&r4,&r5); TX2(&r8,&r9); TX2(&r12,&r13);
  S_4(r0,r4,r8,r12,out0+0,out0+8,out0+16,out0+24);  S_4(r1,r5,r9,r13,out1+0,out1+8,out1+16,out1+24);
  K_N(VLIT4(0.7071,0.7071,0.7071,0.7071),      VLIT4(0.7071,-0.7071,0.7071,-0.7071),
      &r2,&r6,&r10,&r14);
  K_N(VLIT4(0.3827,0.3827,0.3827,0.3827),      VLIT4(0.9239,-0.9239,0.9239,-0.9239),
      &r3,&r7,&r11,&r15);
  TX2(&r2,&r3); TX2(&r6,&r7); TX2(&r10,&r11); TX2(&r14,&r15);
  S_4(r2,r6,r10,r14,out0+4,out0+12,out0+20,out0+28);  S_4(r3,r7,r11,r15,out1+4,out1+12,out1+20,out1+28);
}
Homogeneous size-16 leaf sub-transform for VL-2 size-128 hard-coded leaf FFT

The vector leaf sub-transforms of a single size/type are handled in the same way as a VL-1 sub-transform, with one difference: the vector registers must be transposed before the data is stored to memory in the output array. In the example shown in [link] , the transposes take place at lines 16 and 25.

Prior to the store operations, each position of the vector register (each position being a whole complex word) contains an element belonging to each of the leaf sub-transforms composing the vectorized sub-transform. Because each leaf sub-transform is stored sequentially to different locations in memory with aligned vector store operations, sets of registers are transposed such that each vector register contains elements from only one leaf sub-transform.

<< Chapter < Page Page > Chapter >>

Read also:

Get Jobilize Job Search Mobile App in your pocket Now!

100% Free Mobile Applications
Receive real-time job alerts and never miss the right job again

Source: OpenStax, Computing the fast fourier transform on simd microprocessors. OpenStax CNX. Jul 15, 2012 Download for free at http://cnx.org/content/col11438/1.2

Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Computing the fast fourier transform on simd microprocessors' conversation and receive update notifications?

Ask

	Time and Stress Management MCQ By Dionne Mahaffey Start Quiz
	16 Sociology 16 Education MCQ By OpenStax Start Quiz
	28 Biology 28 Invertebrates MCQ By OpenStax Start Quiz
	1 Microeconomics 01 What Is Economics? By OpenStax Start Flashcards
	Anthropology Language Culture By Richley Crapo Start Assignment
	Theater History Final - Review By Cameron Casey Start Exam
	17 Biology 17 Biotechnology and Genomics MCQ By OpenStax Start Quiz
	3 Microeconomics 03 Demand Supply By OpenStax Start Flashcards
	Classical Music By Marion Cabalfin Start Quiz
	2 Week 2 Social Psych By Yacoub Jayoghli Start Quiz