0.9 The prime factor and winograd fourier transform algorithms (Page 5/5)

Fast fourier transforms Page 5 / 5

There are other modifications of the basic structure of the Type-1 index map DFT algorithm. One is to use the same indexstructure and conversion of the short DFT's to convolution as the PFA but to use some other method for the high-speed convolution.Table look-up of partial products based on distributed arithmetic to eliminate all multiplications [link] looks promising for certain very specific applications, perhaps for specialized VLSIimplementation. Another possibility is to calculate the short convolutions using number-theoretic transforms [link] , [link] , [link] . This would also require special hardware. Direct calculation of short convolutions is faster on certainpipelined processor such as the TMS-320 microprocessor [link] .

Evaluation of the pfa and wfta

As for the Cooley-Tukey FFT's, the first evaluation of these algorithms will be on the number of multiplications and additionsrequired. The number of multiplications to compute the PFA in [link] is given by Multidimensional Index Mapping: Equation 3 . Using the notation that $T (N$ ) is the number of multiplications or additions necessary to calculate alength-N DFT, the total number for a four-factor PFA of length-N, where $N = N_{1} N_{2} N_{3} N_{4}$ is

T (N) = N_{1} N_{2} N_{3} T (N_{4}) + N_{2} N_{3} N_{4} T (N_{1}) + N_{3} N_{4} N_{1} T (N_{2}) + N_{4} N_{1} N_{2} T (N_{3})

The count of multiplies and adds in [link] are calculated from (105) with the counts of the factors taken from Winograd’s Short DFT Algorithms: Table 1 . The list of lengths are those possible with modules in the program of length 2,3, 4, 5, 7, 8, 9 and 16 as is true for the PFA in [link] , [link] and the WFTA in [link] , [link] . A maximum of four relatively prime lengths can be used from this group giving 59 different lengths overthe range from 2 to 5040. The radix-2 or split-radix FFT allows 12 different lengths over the same range. If modules of length 11 and13 from [link] are added, the maximum length becomes 720720 and the number of different lengths becomes 239. Adding modules for17, 19 and 25 from [link] gives a maximum length of 1163962800 and a very large and dense number of possible lengths.The length of the code for the longer modules becomes excessive and should not be included unless needed.

The number of multiplications necessary for the WFTA is simply the product of those necessary for the required modules,including multiplications by unity. The total number may contain some unity multipliers but it is difficult to remove them in apractical program. [link] contains both the total number (MULTS) and the number with the unity multiplies removed (RMULTS).

Calculating the number of additions for the WFTA is more complicated than for the PFA because of the expansion of the datamoving through the algorithm. For example the number of additions, TA, for the length-15 example in [link] is given by

T A (N) = N_{2} T A (N_{1}) + T M_{1} T A (N_{2})

where $N_{1} = 3$ , $N_{2} = 5$ , $T M_{1}$ = the number of multiplies for the length-3 module and hence the expansion factor. As mentionedearlier there is an optimum ordering to minimize additions. The ordering used to calculate [link] is the ordering used in [link] , [link] which is optimal in most cases and close to optimal in the others.

Number of Real Multiplications and Additions for Complex PFA and WFTA FFTs

Length	PFA	PFA	WFTA	WFTA	WFTA
N	Mults	Adds	Mults	RMults	Adds
10	20	88	24	20	88
12	16	96	24	16	96
14	32	172	36	32	172
15	50	162	36	34	162
18	40	204	44	40	208
20	40	216	48	40	216
21	76	300	54	52	300
24	44	252	48	36	252
28	64	400	72	64	400
30	100	384	72	68	384
35	150	598	108	106	666
36	80	480	88	80	488
40	100	532	96	84	532
42	152	684	108	104	684
45	190	726	132	130	804
48	124	636	108	92	660
56	156	940	144	132	940
60	200	888	144	136	888
63	284	1236	198	196	1394
70	300	1336	216	212	1472
72	196	1140	176	164	1156
80	260	1284	216	200	1352
84	304	1536	216	208	1536
90	380	1632	264	260	1788
105	590	2214	324	322	2418
112	396	2188	324	308	2332
120	460	2076	288	276	2076
126	568	2724	396	392	3040
140	600	2952	432	424	3224
144	500	2676	396	380	2880
168	692	3492	432	420	3492
180	760	3624	528	520	3936
210	1180	4848	648	644	5256
240	1100	4812	648	632	5136
252	1136	5952	792	784	6584
280	1340	6604	864	852	7148
315	2050	8322	1188	1186	10336
336	1636	7908	972	956	8508
360	1700	8148	1056	1044	8772
420	2360	10536	1296	1288	11352
504	2524	13164	1584	1572	14428
560	3100	14748	1944	1928	17168
630	4100	17904	2376	2372	21932
720	3940	18276	2376	2360	21132
840	5140	23172	2592	2580	24804
1008	5804	29100	3564	3548	34416
1260	8200	38328	4752	4744	46384
1680	11540	50964	5832	5816	59064
2520	17660	82956	9504	9492	99068
5040	39100	179772	21384	21368	232668

from [link] we see that compared to the PFA or any of the Cooley-Tukey FFT's, the WFTA has significantly fewer multiplications. For the shorterlengths, the WFTA and the PFA have approximately the same number of additions; however for longer lengths, the PFA has fewer and theCooley-Tukey FFT's always have the fewest. If the total arithmetic, the number of multiplications plus the number of additions, iscompared, the split-radix FFT, PFA and WFTA all have about the same count. Special versions of the PFA and WFTA have been developed forreal data [link] , [link] .

The size of the Cooley-Tukey program is the smallest, the PFA next and the WFTA largest. The PFA requires the smallest numberof stored constants, the Cooley-Tukey or split-radix FFT next, and the WFTA requires the largest number. For a DFT of approximately1000, the PFA stores 28 constants, the FFT 2048 and the WFTA 3564. Both the FFT and PFA can be calculated in-place and the WFTA cannot.The PFA can be calculated in-order without an unscrambler. The radix-2 FFT can also, but it requires additional indexing overhead [link] . The indexing and data transfer overhead is greatest for the WFTA because the separate preweave and postweave sections eachrequire their indexing and pass through the complete data. The shorter modules in the PFA and WFTA and the butterflies in the radix2 and 4 FFT's are more efficient than the longer ones because intermediate calculations can be kept in cpu registers rathergeneral memory [link] . However, the shorter modules and radices require more passes through the data for a given approximatelength. A proper comparison will require actual programs to be compiled and run on a particular machine. There are many openquestions about the relationship of algorithms and hardware architecture.

<< Chapter < Page Page > Chapter >>

Read also:

Get Jobilize Job Search Mobile App in your pocket Now!

100% Free Mobile Applications
Receive real-time job alerts and never miss the right job again

Source: OpenStax, Fast fourier transforms. OpenStax CNX. Nov 18, 2012 Download for free at http://cnx.org/content/col10550/1.22

Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Fast fourier transforms' conversation and receive update notifications?

Ask

	Gastrointestinal Pathophysiology 2006 By Tamsin Knox Start Exam
	6 Arts Society: Theater 6 By Jonathan Long Start Quiz
	1 AP 01 Human Body Anatomy Physiology Essay By OpenStax Start Flashcards
	Respiratory System Drugs By CB Biern Start Quiz
	4 Arts Society: Theater 4 By Jonathan Long Start Quiz
©flickr: Jonathan	Power By Megan Earhart Start Quiz
	8 AP 08 Appendicular Skeleton MCQ By OpenStax Start Quiz
	13 Biology 13 Modern Understandings of Inheritance By OpenStax Start Quiz
	Time and Stress Management MCQ By Dionne Mahaffey Start Quiz
©flickr: Clive	Sports Psych. Research Methods By Dakota Bocan Start Flashcards