0.2 Implementation details (Page 7/9)

Page 7 / 9

Split format complex multiplication

  typedef struct _reg_t {
    __m128 re, im;  } reg_t;
   static inline reg_t MUL_SPLIT(reg_t a, reg_t b) {
    reg_t r;    r.re = _mm_sub_ps(_mm_mul_ps(a.re,b.re),_mm_mul_ps(a.im,b.im));
    r.im = _mm_add_ps(_mm_mul_ps(a.re,b.im),_mm_mul_ps(a.im,b.re));    return r;
  }
SSE multiplication with split complex data

The function in [link] takes complex data in two structs of SSE registers, performs the complex multiplication of each element of the vectors, and returns the result in a struct of SSEregisters. Each struct is composed of a register containing the real parts of four complex numbers, and another register containing the imaginary parts – sothe function in [link] is effectively operating on vectors twice as long as the function in [link] . The benefit of operating in split format is obvious: the shuffle operations that wererequired in [link] are avoided because the real and imaginary parts can be implicitly swapped at the instruction level, rather thanby awkwardly manipulating SIMD registers at the data level of abstraction. Thus, [link] computes complex multiplication for vectors twice as long while using one less SSE instruction – not to mention otheradvantages such as reducing chains of dependent instructions. The only disadvantage to the split format approach is that twice as many registers areneeded to compute a given operation – this might preclude the use of a larger radix or force register paging for some kernels of computation.

Fast interleaved format complex multiplication

[link] is fast method of interleaved complex multiplication that may be used in situations where one of the operands can beunpacked prior to multiplication – in such cases the instruction count is reduced from 7 instructions to 4 instructions (cf. [link] ). This method of complex multiplication lends itself especially well to the conjugate-pair algorithm where the same twiddlefactor is used twice – by doubling the size of the twiddle factor LUT, the multiplication instruction count is reduced from 14 instructions to 8instructions. Furthermore, large chains of dependent instructions are reduced,and in practice the actual performance gain can be quite impressive.

Operand a in [link] has been replaced with two operands in [link] : re and im – these operands have been unpacked, as was done in lines 3 and 5 of [link] . Furthermore, line 8 of [link] is also avoided by performing the selective negation during unpacking.

  static inline __m128
  MUL_UNPACKED_INTERLEAVED(__m128 re, __m128 im, __m128 b) {    re = _mm_mul_ps(re, b);
    im = _mm_mul_ps(im, b);    im = _mm_shuffle_ps(im,im,_MM_SHUFFLE(2,3,0,1));
    return _mm_add_ps(re, im);  }
SSE multiplication with partially unpacked interleaved data

Vectorized loops

The performance of the FFTs in the previous sections can be increased by explicitly vectorizing the loops. The Macbook Air 4,2 used to compile and runthe previous examples has a CPU that implements SSE and AVX, but for the purposes of simplicity, SSE intrinsics are used in thefollowing examples. The loop of the radix-2 implementation is used as an example in [link] .

<< Chapter < Page Page > Chapter >>

Read also:

Get Jobilize Job Search Mobile App in your pocket Now!

100% Free Mobile Applications
Receive real-time job alerts and never miss the right job again

Source: OpenStax, Computing the fast fourier transform on simd microprocessors. OpenStax CNX. Jul 15, 2012 Download for free at http://cnx.org/content/col11438/1.2

Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Computing the fast fourier transform on simd microprocessors' conversation and receive update notifications?

Ask

©flickr: Gareth	Professional Etiquette MCQ By Abby Sharp Start Quiz
©flickr: Tim	Hazard Quiz By Royalle Moore Start Quiz
	College physics for ap® courses By OpenStax Read Online Course
	My OCA Mock By Mike Wolf Start Exam
	5 BOD Reproductive System quiz By Brooke Delaney Start Quiz
	11 AP 11 Muscular System Essay By OpenStax Start Flashcards
	6 Arts Society: Theater 6 By Jonathan Long Start Quiz
	2 Sociology 02 Sociological Research MCQ By OpenStax Start Quiz
	3 Lec:3 Cohort Studies By Janet Forrester Start Quiz
	6 AP 06 Skeletal System Essay By OpenStax Start Flashcards