<< Chapter < Page | Chapter >> Page > |
There are other modifications of the basic structure of the Type-1 index map DFT algorithm. One is to use the same indexstructure and conversion of the short DFT's to convolution as the PFA but to use some other method for the high-speed convolution.Table look-up of partial products based on distributed arithmetic to eliminate all multiplications [link] looks promising for certain very specific applications, perhaps for specialized VLSIimplementation. Another possibility is to calculate the short convolutions using number-theoretic transforms [link] , [link] , [link] . This would also require special hardware. Direct calculation of short convolutions is faster on certainpipelined processor such as the TMS-320 microprocessor [link] .
As for the Cooley-Tukey FFT's, the first evaluation of these algorithms will be on the number of multiplications and additionsrequired. The number of multiplications to compute the PFA in [link] is given by Multidimensional Index Mapping: Equation 3 . Using the notation that $T(N$ ) is the number of multiplications or additions necessary to calculate alength-N DFT, the total number for a four-factor PFA of length-N, where $N={N}_{1}{N}_{2}{N}_{3}{N}_{4}$ is
The count of multiplies and adds in [link] are calculated from (105) with the counts of the factors taken from Winograd’s Short DFT Algorithms: Table 1 . The list of lengths are those possible with modules in the program of length 2,3, 4, 5, 7, 8, 9 and 16 as is true for the PFA in [link] , [link] and the WFTA in [link] , [link] . A maximum of four relatively prime lengths can be used from this group giving 59 different lengths overthe range from 2 to 5040. The radix-2 or split-radix FFT allows 12 different lengths over the same range. If modules of length 11 and13 from [link] are added, the maximum length becomes 720720 and the number of different lengths becomes 239. Adding modules for17, 19 and 25 from [link] gives a maximum length of 1163962800 and a very large and dense number of possible lengths.The length of the code for the longer modules becomes excessive and should not be included unless needed.
The number of multiplications necessary for the WFTA is simply the product of those necessary for the required modules,including multiplications by unity. The total number may contain some unity multipliers but it is difficult to remove them in apractical program. [link] contains both the total number (MULTS) and the number with the unity multiplies removed (RMULTS).
Calculating the number of additions for the WFTA is more complicated than for the PFA because of the expansion of the datamoving through the algorithm. For example the number of additions, TA, for the length-15 example in [link] is given by
where ${N}_{1}=3$ , ${N}_{2}=5$ , $T{M}_{1}$ = the number of multiplies for the length-3 module and hence the expansion factor. As mentionedearlier there is an optimum ordering to minimize additions. The ordering used to calculate [link] is the ordering used in [link] , [link] which is optimal in most cases and close to optimal in the others.
Length | PFA | PFA | WFTA | WFTA | WFTA |
N | Mults | Adds | Mults | RMults | Adds |
10 | 20 | 88 | 24 | 20 | 88 |
12 | 16 | 96 | 24 | 16 | 96 |
14 | 32 | 172 | 36 | 32 | 172 |
15 | 50 | 162 | 36 | 34 | 162 |
18 | 40 | 204 | 44 | 40 | 208 |
20 | 40 | 216 | 48 | 40 | 216 |
21 | 76 | 300 | 54 | 52 | 300 |
24 | 44 | 252 | 48 | 36 | 252 |
28 | 64 | 400 | 72 | 64 | 400 |
30 | 100 | 384 | 72 | 68 | 384 |
35 | 150 | 598 | 108 | 106 | 666 |
36 | 80 | 480 | 88 | 80 | 488 |
40 | 100 | 532 | 96 | 84 | 532 |
42 | 152 | 684 | 108 | 104 | 684 |
45 | 190 | 726 | 132 | 130 | 804 |
48 | 124 | 636 | 108 | 92 | 660 |
56 | 156 | 940 | 144 | 132 | 940 |
60 | 200 | 888 | 144 | 136 | 888 |
63 | 284 | 1236 | 198 | 196 | 1394 |
70 | 300 | 1336 | 216 | 212 | 1472 |
72 | 196 | 1140 | 176 | 164 | 1156 |
80 | 260 | 1284 | 216 | 200 | 1352 |
84 | 304 | 1536 | 216 | 208 | 1536 |
90 | 380 | 1632 | 264 | 260 | 1788 |
105 | 590 | 2214 | 324 | 322 | 2418 |
112 | 396 | 2188 | 324 | 308 | 2332 |
120 | 460 | 2076 | 288 | 276 | 2076 |
126 | 568 | 2724 | 396 | 392 | 3040 |
140 | 600 | 2952 | 432 | 424 | 3224 |
144 | 500 | 2676 | 396 | 380 | 2880 |
168 | 692 | 3492 | 432 | 420 | 3492 |
180 | 760 | 3624 | 528 | 520 | 3936 |
210 | 1180 | 4848 | 648 | 644 | 5256 |
240 | 1100 | 4812 | 648 | 632 | 5136 |
252 | 1136 | 5952 | 792 | 784 | 6584 |
280 | 1340 | 6604 | 864 | 852 | 7148 |
315 | 2050 | 8322 | 1188 | 1186 | 10336 |
336 | 1636 | 7908 | 972 | 956 | 8508 |
360 | 1700 | 8148 | 1056 | 1044 | 8772 |
420 | 2360 | 10536 | 1296 | 1288 | 11352 |
504 | 2524 | 13164 | 1584 | 1572 | 14428 |
560 | 3100 | 14748 | 1944 | 1928 | 17168 |
630 | 4100 | 17904 | 2376 | 2372 | 21932 |
720 | 3940 | 18276 | 2376 | 2360 | 21132 |
840 | 5140 | 23172 | 2592 | 2580 | 24804 |
1008 | 5804 | 29100 | 3564 | 3548 | 34416 |
1260 | 8200 | 38328 | 4752 | 4744 | 46384 |
1680 | 11540 | 50964 | 5832 | 5816 | 59064 |
2520 | 17660 | 82956 | 9504 | 9492 | 99068 |
5040 | 39100 | 179772 | 21384 | 21368 | 232668 |
from [link] we see that compared to the PFA or any of the Cooley-Tukey FFT's, the WFTA has significantly fewer multiplications. For the shorterlengths, the WFTA and the PFA have approximately the same number of additions; however for longer lengths, the PFA has fewer and theCooley-Tukey FFT's always have the fewest. If the total arithmetic, the number of multiplications plus the number of additions, iscompared, the split-radix FFT, PFA and WFTA all have about the same count. Special versions of the PFA and WFTA have been developed forreal data [link] , [link] .
The size of the Cooley-Tukey program is the smallest, the PFA next and the WFTA largest. The PFA requires the smallest numberof stored constants, the Cooley-Tukey or split-radix FFT next, and the WFTA requires the largest number. For a DFT of approximately1000, the PFA stores 28 constants, the FFT 2048 and the WFTA 3564. Both the FFT and PFA can be calculated in-place and the WFTA cannot.The PFA can be calculated in-order without an unscrambler. The radix-2 FFT can also, but it requires additional indexing overhead [link] . The indexing and data transfer overhead is greatest for the WFTA because the separate preweave and postweave sections eachrequire their indexing and pass through the complete data. The shorter modules in the PFA and WFTA and the butterflies in the radix2 and 4 FFT's are more efficient than the longer ones because intermediate calculations can be kept in cpu registers rathergeneral memory [link] . However, the shorter modules and radices require more passes through the data for a given approximatelength. A proper comparison will require actual programs to be compiled and run on a particular machine. There are many openquestions about the relationship of algorithms and hardware architecture.
Notification Switch
Would you like to follow the 'Fast fourier transforms' conversation and receive update notifications?