<< Chapter < Page | Chapter >> Page > |
To write a program that computes the circular convolution of $h$ and $x$ using the bilinear form Equation 24 in Bilinear Forms for Circular Convolution we need subprograms that carry out the action of $P$ , ${P}^{t}$ , $R$ , ${R}^{t}$ , $A$ and ${B}^{t}$ . We are assuming, as is usually done, that $h$ is fixed and known so that $u={C}^{t}{R}^{-t}PJh$ can be pre-computed and stored. To compute these multiplicative constants $u$ we need additional subprograms to carry out the action of ${C}^{t}$ and ${R}^{-t}$ but the efficiency with which we compute $u$ is unimportant since this is done beforehand and $u$ is stored.
In
Prime Factor Permutations we discussed the permutation
$P$ and a program for it
pfp()
appears in the appendix.
The reduction operations
$R$ ,
${R}^{t}$ and
${R}^{-t}$ we have described in
Reduction Operations and
programs for these reduction operations
KRED()
etc, also appear in the appendix.
To carry out the operation of
$A$ and
${B}^{t}$ we need to be able to carry out the action of
${A}_{{d}_{1}}\otimes \cdots \otimes {A}_{{d}_{k}}$ and this was discussed in
Implementing Kronecker Products Efficiently .
Note that since
$A$ and
${B}^{t}$ are block diagonal, each diagonal block
can be done separately.However, since they are rectangular, it is necessary
to be careful so that the correct indexing is used.
To facilitate the discussion of the programs we generate, it is useful to consider an example.Take as an example the 45 point circular convolution algorithm listed in the appendix.From Equation 19 from Bilinear Forms for Circular Convolution we find that we need to compute $x={P}_{9,5}x$ and $x={R}_{9,5}x$ . These are the first two commands in the program.
We noted above that bilinear forms for linear convolution, $({D}_{d},{E}_{d},{F}_{d})$ , can be used for these cyclotomic convolutions.Specifically we can take ${A}_{{p}^{i}}={D}_{\phi \left({p}^{i}\right)}$ , ${B}_{{p}^{i}}={E}_{\phi \left({p}^{i}\right)}$ and ${C}_{{p}^{i}}={G}_{{p}^{i}}{F}_{\phi \left({p}^{i}\right)}$ . In this case Equation 20 in Bilinear Forms for Circular Convolution becomes
In our approach this is what we have done. When we use the bilinear forms for convolution givenin the appendix, for which ${D}_{4}={D}_{2}\otimes {D}_{2}$ and ${D}_{6}={D}_{2}\otimes {D}_{3}$ , we get
and since ${E}_{d}={D}_{d}$ for the linear convolution algorithms listed in the appendix,we get
From the discussion above, we found that the Kronecker products
like
${D}_{2}\otimes {D}_{2}\otimes {D}_{2}$ appearing in these expressions are best carried
out by factoring the product in to factorsof the form
${I}_{a}\otimes {D}_{2}\otimes {I}_{b}$ .
Therefore we need a program to carry out
$({I}_{a}\otimes {D}_{2}\otimes {I}_{b})x$ and
$({I}_{a}\otimes {D}_{3}\otimes {I}_{b})x$ .
These function are called
ID2I(a,b,x)
and
ID3I(a,b,x)
and are listed in the appendix.
The transposed form,
$({I}_{a}\otimes {D}_{2}^{t}\otimes {I}_{b})x$ ,
is called
ID2tI(a,b,x)
.
To compute the multiplicative constants we need ${C}^{t}$ . Using ${C}_{{p}^{i}}={G}_{{p}^{i}}{F}_{\phi \left({p}^{i}\right)}$ we get
The Matlab function
KFt
carries out
the operation
${F}_{{d}_{1}}\otimes \cdots {F}_{{d}_{K}}$ .
The Matlab function
Kcrot
implements the
operation
${G}_{{p}_{1}^{{e}_{1}}}\otimes \cdots {G}_{{p}_{K}^{{e}_{K}}}$ .
They are both listed in the appendix.
By recognizing that the convolution algorithms for different lengths share a lot of the same computations, it is possibleto write a set of programs that take advantage of this. The programs we have generated call functions from a relativessmall set. Each program calls these functions with different arguments,in differing orders, and a different number of times. By organizing the program structure in a modular way,we are able to generate relatively compact code for a wide variety of lengths.
In the appendix we have listed code for the following functions, from which we create circular convolution algorithms.In the next section we generate FFT programs using this same set of functions.
pfp
implements this permutation of
Prime Factor Permutations .
Its transpose is implemented by
pfpt
.KRED
implements the reduction operations of
Reduction Operations .
Its transpose is implemented by
tKRED
.
Its inverse transpose is implemented by
itKRED
and this function
is used only for computing the multiplicative constants.ID2I
and
ID3I
are Matlab functions for the operations
$I\otimes {D}_{2}\otimes I$ and
$I\otimes {D}_{3}\otimes I$ .
These linear convolution operations are also described in the appendix`Bilinear Forms for Linear Convolution.'
ID2tI
and
ID3tI
implement the transposes,
$I\otimes {D}_{2}^{t}\otimes I$ and
$I\otimes {D}_{3}^{t}\otimes I$ .[link] lists operation counts for some of the circular convolution algorithms we have generated.The operation counts do not include any arithmetic operations involved in the index variable or loops.They include only the arithmetic operations that involve the data sequence $x$ in the convolution of $x$ and $h$ .
The table in [link] for the split nesting algorithm gives very similar arithmetic operation counts.For all lengths not divisible by 9, the algorithms we have developed use the same number of multiplications and the same number or fewer additions.For lengths which are divisible by 9, the algorithms described in [link] require fewer additions than do ours. This is because the algorithms whose operation counts aretabulated in the table in [link] use a special ${\Phi}_{9}\left(s\right)$ convolution algorithm. It should be noted, however, that the efficient ${\Phi}_{9}\left(s\right)$ convolution algorithm of [link] is not constructed from smaller algorithms using the Kronecker product, as is ours.As we have discussed above, the use of the Kronecker product facilitates adaptation to special computer architectures andyields a very compact program with function calls to a small set of functions.
N | muls | adds | N | muls | adds | N | muls | adds | N | muls | adds | |||
2 | 2 | 4 | 24 | 56 | 244 | 80 | 410 | 1546 | 240 | 1640 | 6508 | |||
3 | 4 | 11 | 27 | 94 | 485 | 84 | 320 | 1712 | 252 | 1520 | 7920 | |||
4 | 5 | 15 | 28 | 80 | 416 | 90 | 380 | 1858 | 270 | 1880 | 9074 | |||
5 | 10 | 31 | 30 | 80 | 386 | 105 | 640 | 2881 | 280 | 2240 | 9516 | |||
6 | 8 | 34 | 35 | 160 | 707 | 108 | 470 | 2546 | 315 | 3040 | 13383 | |||
7 | 16 | 71 | 36 | 95 | 493 | 112 | 656 | 2756 | 336 | 2624 | 11132 | |||
8 | 14 | 46 | 40 | 140 | 568 | 120 | 560 | 2444 | 360 | 2660 | 11392 | |||
9 | 19 | 82 | 42 | 128 | 718 | 126 | 608 | 3378 | 378 | 3008 | 16438 | |||
10 | 20 | 82 | 45 | 190 | 839 | 135 | 940 | 4267 | 420 | 3200 | 14704 | |||
12 | 20 | 92 | 48 | 164 | 656 | 140 | 800 | 3728 | 432 | 3854 | 16430 | |||
14 | 32 | 170 | 54 | 188 | 1078 | 144 | 779 | 3277 | 504 | 4256 | 19740 | |||
15 | 40 | 163 | 56 | 224 | 1052 | 168 | 896 | 4276 | 540 | 4700 | 21508 | |||
16 | 41 | 135 | 60 | 200 | 952 | 180 | 950 | 4466 | 560 | 6560 | 25412 | |||
18 | 38 | 200 | 63 | 304 | 1563 | 189 | 1504 | 7841 | 630 | 6080 | 28026 | |||
20 | 50 | 214 | 70 | 320 | 1554 | 210 | 1280 | 6182 | 720 | 7790 | 30374 | |||
21 | 64 | 317 | 72 | 266 | 1250 | 216 | 1316 | 6328 | 756 | 7520 | 38144 |
It is possible to make further improvements to the operation counts given in [link] [link] , [link] . Specifically, algorithms for prime power cyclotomic convolutionbased on the polynomial transform, although more complicated, will give improvements for the longer lengths listed [link] , [link] . These improvements can be easily included in the code generatingprogram we have developed.
Notification Switch
Would you like to follow the 'Automatic generation of prime length fft programs' conversation and receive update notifications?