<< Chapter < Page Chapter >> Page >

[link] is a C++ implementation of the vectorize_ks function. For each CNodeBfly node, the function searches forward for another CNodeBfly that does not have a register dependence. Once found, the registers of the latter node are added to the former node, and the latter node erased. Finally, at line 19, the registers of the vectorized CNodeBfly node are merged using a perfect shuffle, which is then recursively applied on each half of the list. The effect is a merge that works for any power of 2 vector length.

  void CSplitRadix::vectorize_ks() {     vector<CNodeHardCoded *>::iterator i;     for(i=ns.begin(); i != ns.end();++i) {      if(!(*i)->type().compare(``blockbfly'')) {         vector<CNodeHardCoded *>::iterator j = i+1, pj = i;         int count = 1;        while(j != ns.end() && count < VL) {           if(!(*j)->type().compare(``blockbfly'') && !register_dependence(*i, *j)) {             (*i)->rs.insert( (*i)->rs.end(), (*j)->rs.begin(), (*j)->rs.end());             ns.erase(j);            count++;             j = pj+1;          }else {             pj = j; ++j;          }         }        (*i)->merge_rs();       }    }   }
Body node vectorization
  CNodeLoad *   CSplitRadix::find_parallel_load(vector<CNodeHardCoded*>::iterator i){     CNodeLoad *b = (CNodeLoad *)(*i);    for(int k=0;k<((N>2)?4:2);k++) {       vector<CNodeHardCoded *>::iterator j = i+1;       while(j != ns.end()) {        if(!(*j)->type().compare(``blockload'')) {           CNodeLoad *b2 = (CNodeLoad *)(*j);          if(b2->iaddrs[k] > b->iaddrs[0] && b2->iaddrs[k] < b->iaddrs[0]+VL) {            ns.erase(j);             return b2;          }           ++j;        }       }    }     return NULL;  }   void CSplitRadix::merge_loads(CNodeLoad *b1, CNodeLoad *b2) {    for(int i=0;i<b1->size;i++) {       for(int j=0;j<b2->iaddrs.size();j++) {         if(b2->iaddrs[j] > b1->iaddrs[i] && b2->iaddrs[j] < b1->iaddrs[i]+VL) {          b1->iaddrs.push_back(b2->iaddrs[j]);          b1->rs.push_back(b2->rs[j]);          if(rmap[b2->rs[j]] == b2) rmap[b2->rs[j]] = b1;        }       }    }     b1->types.push_back(b2->types[0]);  }   void CSplitRadix::vectorize_loads() {    vector<CNodeHardCoded *>::iterator i;     for(i=ns.begin(); i != ns.end();++i) {      if(!(*i)->type().compare(``blockload'')) {         while(CNodeLoad *b2 = find_parallel_load(i))          merge_loads((CNodeLoad *)(*i), b2);       }    }   }
Leaf node vectorization

If vectorize_loads and vectorize_ks are invoked with V L = 2 on the topological ordering of nodes in [link] , the result is the vectorized node list shown in [link] . As in "Emitting code" , emitting code is a fairly simple process, and [link] is the code emitted from the node list in [link] . There are only a few differences to note about the emitted code when V L > 1 .

  1. The register identifiers in line 4 of [link] consist of a list of two integers delimited with an underscore. The integers listed in each register's name are the VL-1 registers that were subsumed to create the larger register (cf. VL-1 code in [link] );
  2. The leaf primitives (lines 6 and 7 in [link] ) have a list of underscore delimited integers in the name, where each integer corresponds to the type of sub-transform to be computed on that position in the vector registers. For example, the L_4_4 primitive is named to indicate a size-4 leaf operation on the lower and upper halves of the vector registers, while the L_2_4 primitive performs two size-2 leaf operations on the lower half of the registers and a size-4 leaf operation on the upper halves;
  3. The body node primitives ( K_N ) and store primitives ( S_4 ) are unchanged because they perform the same operation on each element of the vector registers. This is as a result of the register transposes that were previously performed on the outputs of the leaf primitives.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Computing the fast fourier transform on simd microprocessors. OpenStax CNX. Jul 15, 2012 Download for free at http://cnx.org/content/col11438/1.2
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Computing the fast fourier transform on simd microprocessors' conversation and receive update notifications?

Ask