<< Chapter < Page Chapter >> Page >
Here the "polyphase quadrature" filterbank used in the MPEG audio standards is described in great detail. It has the following practical features: real-valued sub-band outputs, near-perfect reconstruction, and polyphase implementation; and is based on cancellation of adjacent sub-band interference.
  • Though the uniformly modulated filterbank in Figure 4 from "Uniformly-Modulated Filterbanks" was shown to have the fast implementation in Figure 5 from "Uniformly-Modulated Filterbanks" , the sub-band outputs are complex-valued for real-valuedinput, hence inconvenient (at first glance In the structure in Figure 4 from "Uniformly-Modulated Filterbanks" , it would be reasonable to replace the standard DFT with a real-valued DFT (defined inthe notes on transform coding), requiring N log 2 N real-multiplies when N is a power of 2. Though it is not clear to the author why such a structure was notadopted in the MPEG standards, the cosine modulated filterbank derived in this section has equivalent performance and, withits polyphase/DCT implementation, equivalent implementation cost. ) for sub-band coding of real-valued data. In this section we propose a closely related filterbank with thefollowing properties.
    1. Real-valued sub-band outputs (assuming real-valued inputs),
    2. Near-perfect reconstruction,
    3. Polyphase/fast-transform implementation.
    This turns out to be the filterbank specified in the MPEG-1 and 2 (layers 1-3) audio compression standards (see IS0/IEC 13818-3).

Filter design

  • Real-valued Sub-band Outputs: Recall the generic filterbank structure of Figure 1 from "Uniformly-Modulated Filterbanks" . For the sub-band outputs to be real-valued (for real-valued input),we require that the impulse responses of { H i ( z ) } and { K i ( z ) } are real-valued. We can insure this by allocating the N (symmetric) frequency band pairs shown in [link] . The positive and negative halves of each band pair are centered at ω i = ( 2 i + 1 ) π 2 N radians.
    This figure is a cartesian graph, with horizontal axis ω, and an unlabeled vertical axis. The graph consists of eight colored, connected rectangles of identical dimension, beginning. The rectangles all have one side drawn on the base of the graph. The leftmost rectangle's left side is located at a ω-value of -π, and the rightmost rectangles right side is located at a ω-value of π. The midpoint along the horizontal axis of each rectangle is labeled from left to right as -ω_3, -ω_2, -ω_1, -ω_0, ω_0, ω_1, ω_2, and ω_3. The rectangles from left to right are colored green, red, blue, grey, grey, blue, red, green. Above the left side of the graph is a horizontal arrow pointing in both directions, labeled π/N. Above the right side of the graph is the equation ω_i = [(2i + 1)π]/2N. This figure is a cartesian graph, with horizontal axis ω, and an unlabeled vertical axis. The graph consists of eight colored, connected rectangles of identical dimension, beginning. The rectangles all have one side drawn on the base of the graph. The leftmost rectangle's left side is located at a ω-value of -π, and the rightmost rectangles right side is located at a ω-value of π. The midpoint along the horizontal axis of each rectangle is labeled from left to right as -ω_3, -ω_2, -ω_1, -ω_0, ω_0, ω_1, ω_2, and ω_3. The rectangles from left to right are colored green, red, blue, grey, grey, blue, red, green. Above the left side of the graph is a horizontal arrow pointing in both directions, labeled π/N. Above the right side of the graph is the equation ω_i = [(2i + 1)π]/2N.
    Frequency band pairs for the polyphase quadrature filterbank ( N = 4 ).
    We can consider each filter H i ( z ) as some combination of symmetric positive-frequency and negative-frequency components
    H i ( z ) = a i F i ( z ) + b i G i ( z )
    as shown in [link] .
    this figure is a graph with horizontal axis ω and vertical axis H_i(ω). The horizontal axis establishes wide boundaries of -π and π. Below the graph is a horizontal arrow pointing in both directions, labeled (2i + 1)π/N. There are two shaded trapezoids located on the graph, with their base drawn on the horizontal axis. The trapezoid on the left is labeled G_i(ω), and the midpoint of its base is labeled with a horizontal value -ω_i. The trapezoid on the right is labeled F_i(ω), and the midpoint of its base is labeled with a horizontal value ω_i. Above both trapezoids are horizontal arrows pointing in both directions, labeled π/N. this figure is a graph with horizontal axis ω and vertical axis H_i(ω). The horizontal axis establishes wide boundaries of -π and π. Below the graph is a horizontal arrow pointing in both directions, labeled (2i + 1)π/N. There are two shaded trapezoids located on the graph, with their base drawn on the horizontal axis. The trapezoid on the left is labeled G_i(ω), and the midpoint of its base is labeled with a horizontal value -ω_i. The trapezoid on the right is labeled F_i(ω), and the midpoint of its base is labeled with a horizontal value ω_i. Above both trapezoids are horizontal arrows pointing in both directions, labeled π/N.
    Positive- and negative-frequency decomposition of H i ( ω ) . Note K i ( ω ) will have a similar, if not identical, frequency response.
    When b i = a i * and the pairs { F i ( z ) , G i ( z ) } are modulated versions of the same prototype filter H ( z ) , we can show that H i ( z ) must be real-valued:
    H i ( z ) = a i H ( e - j π 2 i + 1 2 N z ) F i ( z ) + a i * H ( e j π 2 i + 1 2 N z ) G i ( z ) = a i n h n e j π 2 i + 1 2 N n z - n + a i * n h n e - j π 2 i + 1 2 N n z - n = Re ( a i ) n h n z - n e j π 2 i + 1 2 N n + e - j π 2 i + 1 2 N n + j Im ( a i ) n h n z - n e j π 2 i + 1 2 N n - e - j π 2 i + 1 2 N n = Re ( a i ) n h n z - n · 2 cos π 2 i + 1 2 N n + j Im ( a i ) n h n z - n · 2 j sin π 2 i + 1 2 N n = 2 n Re ( a i ) cos π 2 i + 1 2 N n - Im ( a i ) sin π 2 i + 1 2 N n h n z - n
  • Aliasing Cancellation: Recall again the generic filterbank in Figure 1 from "Uniformly-Modulated Filterbanks" . Here we determine conditions on real-valued { H i ( z ) } and { K i ( z ) } which lead to near-perfect reconstruction. It will be insightful to derive an expression for the input to the i t h reconstruction filter, { y i ( n ) } . The downsample-upsample-cascade equation Equation 14 from "Fundamentals of Multirate Signal Processing" (fourth equation) implies that
    Y i ( z ) = 1 N p = 0 N - 1 X i e - j 2 π N p z = 1 N p = 0 N - 1 H i e - j 2 π N p z X e - j 2 π N p z = 1 N p = 0 N - 1 a i F i e - j 2 π N p z + a i * G i e - j 2 π N p z X e - j 2 π N p z = 1 N a i F i ( z ) + a i * G i ( z ) X ( z ) desired + 1 N p = 1 N - 1 a i F i e - j 2 π N p z + a i * G i e - j 2 π N p z X e - j 2 π N p z undesired images .
    Thus the input to the i t h reconstruction filter is corrupted by unwanted spectral images, and the reconstruction filter's job is theremoval of these images. The reconstruction filter K i ( z ) will have a bandpass frequency response similar (or identical) to that of H i ( z ) illustrated in [link] . Due to the practical design considerations, neither K i ( z ) nor H i ( z ) will be perfect bandpass filters, but we will assume that the only significant out-of-band energy passed by these filters willoccur in the frequency range just outside of their passbands. (Note the limited “spillover” in [link] .) Under these assumptions, the only undesired images in Y i ( ω ) that will not be completely attenuated by K i ( ω ) are the images adjacent to F i ( ω ) and G i ( ω ) . Which indices p in [link] (third equation) are responsible for these adjacent images? [link] (third equation) implies that index p = shifts the frequency response up by 2 π / N radians. Since the passband centers of F i ( z ) and G i ( z ) are ( 2 i + 1 ) π / N radians apart, the passband of G i e - j 2 π N p z will reside directly to the left of the passband of F i ( z ) when p = i . Similarly, the passband of G i e - j 2 π N p z will reside directly to the right of the passband of F i ( z ) when p = i + 1 . See [link] for an illustration. Using the same reasoning, the passband of F i e - j 2 π N p z will reside directly to the right of the passband of G i ( z ) when p = - i and directly to the left when p = - ( i + 1 ) . The only exceptions to this rule occur when i = 0 , in which case the images to the right of G i ( z ) and to the left of F i ( z ) are desired, and when i = N - 1 , in which case the images to the left of G i ( z ) and to the right of F i ( z ) are desired.
    This figure is identical to the preceding figure, except that there are now four dashed trapezoids, one on each side of each of the shaded trapezoids. The bases slightly overlap on both sides with the shaded trapezoids. The trapezoids to the left and right of G_i(z) are labeled F_i(e^(-j(2π/n)p)z), and the trapezoids to the left and right of F_i(z) are labeled G_i(e^(-j(2π/n)p)z). Above the dashed trapezoids are small captions that read from left to right, p = -(i + 1), p = -i, p = i, and p = i + 1. This figure is identical to the preceding figure, except that there are now four dashed trapezoids, one on each side of each of the shaded trapezoids. The bases slightly overlap on both sides with the shaded trapezoids. The trapezoids to the left and right of G_i(z) are labeled F_i(e^(-j(2π/n)p)z), and the trapezoids to the left and right of F_i(z) are labeled G_i(e^(-j(2π/n)p)z). Above the dashed trapezoids are small captions that read from left to right, p = -(i + 1), p = -i, p = i, and p = i + 1.
    Spectral images of Y i ( ω ) not completely attenuated by K i ( ω ) .
    Based on the arguments above, we can write { u i ( n ) } , the output of the i t h reconstruction filter, as follows:
    U i ( z ) = K i ( z ) Y i ( z ) = 1 N K i ( z ) a i F i ( z ) X ( z ) + a i * G i ( z ) X ( z ) desired + 1 N K i ( z ) a i F i e j 2 π N i z X e j 2 π N i z + a i * G i e - j 2 π N i z X e - j 2 π N i z aliasing from inner undesired images when 1 i N - 1 + 1 N K i ( z ) a i F i e j 2 π N ( i + 1 ) z X e j 2 π N ( i + 1 ) z + a i * G i e - j 2 π N ( i + 1 ) z X e - j 2 π N ( i + 1 ) z aliasing from outer undesired images when 0 i N - 2 .
    The previous equation shows that U i ( z ) is corrupted by the portions of the undesired images not completely removed by the reconstructionfilter K i ( z ) . In the filterbank context, this undesired behavior is referred to asaliasing. But notice that aliasing contributions to the signal U ( z ) = i U i ( z ) will vanish if the inner aliasing components in U i ( z ) cancel the outer aliasing components in U i - 1 ( z ) . This happens when
    K i ( z ) a i F i e j 2 π N i z X e j 2 π N i z + a i * G i e - j 2 π N i z X e - j 2 π N i z = - K i - 1 ( z ) a i - 1 F i - 1 e j 2 π N i z X e j 2 π N i z + a i - 1 * G i - 1 e - j 2 π N i z X e - j 2 π N i z .
    which occurs under satisfaction of the two conditions below.
    a i K i ( z ) F i e j 2 π N i z = - a i - 1 K i - 1 ( z ) F i - 1 e j 2 π N i z a i * K i ( z ) G i e - j 2 π N i z = - a i - 1 * K i - 1 ( z ) G i - 1 e - j 2 π N i z .
    We assume from this point on that the real-valued filters { H i ( z ) } and { K i ( z ) } are constructed using modulated versions of a lowpass prototype filter H ( z ) . (This assumption is required for the existence of a polyphasefilterbank implementation.)
    H i ( z ) = a i F i ( z ) + a i * G i ( z ) K i ( z ) = c i F i ( z ) + c i * G i ( z ) where { F i ( z ) = H e - j π 2 N ( 2 i + 1 ) z G i ( z ) = H e j π 2 N ( 2 i + 1 ) z
    Then condition [link] (upper equation) becomes
    a i c i H e - j π 2 N ( 2 i + 1 ) z H e j π 2 N ( 2 i - 1 ) z + a i c i * H e j π 2 N ( 2 i + 1 ) z H e j π 2 N ( 2 i - 1 ) z = - a i - 1 c i - 1 H e - j π 2 N ( 2 i - 1 ) z H e j π 2 N ( 2 i + 1 ) z - a i - 1 c i - 1 * H e j π 2 N ( 2 i - 1 ) z H e j π 2 N ( 2 i + 1 ) z .
    Lets take a closer look at the products H e - j π 2 N ( 2 i + 1 ) z H e j π 2 N ( 2 i - 1 ) z in the previous equation. As illustrated in [link] , these products equal zero when 1 i N / 2 since their passbands do not overlap. Setting these products to zero in [link] (bottom equation) yields the condition
    a i c i * = - a i - 1 c i - 1 * for 1 i N - 1 ,
    which can also be shown to satisfy [link] (bottom equation).
    This figure is a cartesian graph with horizontal axis ω. There are three identical shaded trapezoids of similar shape to those trapezoids in the previous figure. Two of the trapezoids are located in the second quadrant, and the other is above the first and third trapezoids is the title H(e^(-j(π/2N)(2i + 1))z), and above the second is the title  H(e^(j(π/2N)(2i + 1))z). The midpoint of the bases of these trapezoids are measured as follows: the leftmost's horizontal position is (π/2N)(2i + 1) - 2π, the second trapezoid's midpoint is (-π/(2N))(2i - 1), and the rightmost is (π/2N)(2i + 1). Below these trapezoids are two horizontal lines with arrows pointing in either direction. The first line is labeled (2(N - i) -1)π/N, and the second is labeled (2i - 1)π/N. This figure is a cartesian graph with horizontal axis ω. There are three identical shaded trapezoids of similar shape to those trapezoids in the previous figure. Two of the trapezoids are located in the second quadrant, and the other is above the first and third trapezoids is the title H(e^(-j(π/2N)(2i + 1))z), and above the second is the title  H(e^(j(π/2N)(2i + 1))z). The midpoint of the bases of these trapezoids are measured as follows: the leftmost's horizontal position is (π/2N)(2i + 1) - 2π, the second trapezoid's midpoint is (-π/(2N))(2i - 1), and the rightmost is (π/2N)(2i + 1). Below these trapezoids are two horizontal lines with arrows pointing in either direction. The first line is labeled (2(N - i) -1)π/N, and the second is labeled (2i - 1)π/N.
    Illustration of vanishing terms in [link] (lower equation).
    Next we concern ourselves with the requirements on a 0 and c 0 . Assuming [link] is satisfied, we know that inner aliasing in U i ( z ) cancels outer aliasing in U i - 1 ( z ) for 1 i N - 1 . Hence, from [link] (fourth equation) and [link] (lower equation),
    U ( z ) = i = 0 N - 1 U i ( z ) = 1 N i = 0 N - 1 K i ( z ) H i ( z ) X ( z ) = 1 N i = 0 N - 1 c i H e - j π 2 N ( 2 i + 1 ) z + c i * H e j π 2 N ( 2 i + 1 ) z · a i H e - j π 2 N ( 2 i + 1 ) z + a i * H e j π 2 N ( 2 i + 1 ) z X ( z )
    Noting that the passbands of H e - j π 2 N ( 2 i + 1 ) z and H e j π 2 N ( 2 i + 1 ) z do not overlap for 1 i N - 2 , we have
    U ( z ) = 1 N [ ( a 0 c 0 * + a 0 * c 0 ) H e - j π 2 N z H e j π 2 N z + ( a N - 1 c N - 1 * + a N - 1 * c N - 1 ) H e - j π 2 N ( 2 N - 1 ) z H e j π 2 N ( 2 N - 1 ) z + i = 0 N - 1 a i c i H 2 e - j π 2 N ( 2 i + 1 ) z + a i * c i * H 2 e j π 2 N ( 2 i + 1 ) z ] X ( z ) .
    The first two terms in [link] (third equation) represent aliasing components that prevent flat overall response at ω = 0 and ω = π , respectively. These aliasing terms vanish when
    a 0 c 0 * = - a 0 * c 0 a N - 1 c N - 1 * = - a N - 1 * c N - 1
    What remains is
    U ( z ) = 1 N i = 0 N - 1 a i c i H 2 e - j π 2 N ( 2 i + 1 ) z + a i * c i * H 2 e j π 2 N ( 2 i + 1 ) z X ( z ) .
  • Phase Distortion: Perfect reconstruction requires that the analysis/synthesis system hasno phase distortion. To guarantee the absence of phase distortion, we require that thecomposite system
    Q ( z ) : = U ( z ) X ( z ) = 1 N i = 0 N - 1 a i c i H 2 e - j π 2 N ( 2 i + 1 ) z + a i * c i * H 2 e j π 2 N ( 2 i + 1 ) z
    has a linear phase response. (Recall that a linear phase response is equivalent to a pure delayin the time domain.) This linear-phase constraint will provide the final condition used tospecify the constants { a i } and { c i } . We start by examining the impulse response of Q ( z ) . Using a technique analogous to [link] (fifth equation), we can write
    Q ( z ) = 2 N n = 0 2 M - 2 i = 0 N - 1 Re ( a i c i ) cos π 2 i + 1 2 N n - Im ( a i c i ) sin π 2 i + 1 2 N n k h k h n - k z - n
    Above, we have used the property that multiplication in the z -domain implies convolution in the time domain.For Q ( z ) to be linear phase, it's impulse response must be symmetric. Let us assume that the prototype filter H ( z ) is linear phase, so that { h n } is symmetric. Thus k h m h n - k is symmetric about n = M - 1 , and thus for linear phase Q ( z ) , we require that the quantity
    i = 0 N - 1 Re ( a i c i ) cos π 2 i + 1 2 N n - Im ( a i c i ) sin π 2 i + 1 2 N n
    is symmetric about n = M - 1 , i.e.,
    i = 0 N - 1 Re ( a i c i ) cos π 2 i + 1 2 N ( M - 1 + n ) - Im ( a i c i ) sin π 2 i + 1 2 N ( M - 1 + n ) = i = 0 N - 1 Re ( a i c i ) cos π 2 i + 1 2 N ( M - 1 - n ) - Im ( a i c i ) sin π 2 i + 1 2 N ( M - 1 - n )
    for n = 0 , , M - 1 . Using trigonometric identities, it can be shown that the condition aboveis equivalent to
    0 = i = 0 N - 1 sin π 2 i + 1 2 N n Re ( a i c i ) sin π 2 i + 1 2 N ( M - 1 ) + Im ( a i c i ) cos π 2 i + 1 2 N ( M - 1 ) ,
    which is satisfied when
    Im ( a i c i ) Re ( a i c i ) = - sin π 2 i + 1 2 N ( M - 1 ) cos π 2 i + 1 2 N ( M - 1 ) = tan - π 2 i + 1 2 N ( M - 1 ) .
    Restricting | a i | = | c i | = 1 , the previous equation requires that
    a i c i = e - j π 2 i + 1 2 N ( M - 1 ) .
    It can be easily verified that the following { a i } and { c i } satisfy conditions [link] , [link] , and [link] :
    a i = e - j π M + N - 1 4 N ( 2 i + 1 ) c i = e - j π M - N - 1 4 N ( 2 i + 1 ) .
    Plugging these into the expression for H i ( z ) we find that
    H i ( z ) = a i H e - j π 2 i + 1 2 N z + a i * H e j π 2 i + 1 2 N z = n = 0 M - 1 a i e j π 2 i + 1 2 N n + a i * e - j π 2 i + 1 2 N n h n z - n = n = 0 M - 1 e j π 2 i + 1 2 N ( n - M + N - 1 2 ) + e - j π 2 i + 1 2 N ( n - M + N - 1 2 ) h n z - n = n = 0 M - 1 2 cos π 2 i + 1 2 N n - M + N - 1 2 h n impulse response of H i ( z ) z - n .
    Repeating this procedure for K i ( z ) yields
    K i ( z ) = n = 0 M - 1 2 cos π 2 i + 1 2 N n - M - N - 1 2 h n impulse response of K i ( z ) z - n .
    At this point we make a few comments on the design of the lowpass prototype H ( z ) . The perfect H ( z ) would be an ideal linear-phase lowpass filter with cutoff at ω = π / 2 N , as illustrated in [link] . Such a filter would perfectly separate the subbands as well as yieldflat composite magnitude response, as per [link] . Unfortunately, however, this perfect filter is not realizable with afinite number of filter coefficients. So, what we really want is a finite-length FIR filter having goodfrequency selectivity, nearly-flat composite response, and linear phase. The length-512 prototype filter specified in the MPEG standards issuch a filter, as evidenced by the responses in [link] . Unfortunately, the standards do not describe how this filter was designed,and a thorough discussion of multirate filter design is outside the scope of this course. For more on prototype filter design, we point the interested reader to page 358 of Vaidyanathan or Crochiere&Rabiner.
    This figure is a graph with horizontal axis ω, ranging in value from -π to π, and vertical axis |H(π)|. There is one dashed rectangle with its base sitting on the horizontal axis, and  with its width measured from horizontal position -π/2N to π/2N. The height is not measured or labeled. There is also a solid curve that, in a calmly wavelike distortion follows the horizontal axis to the bottom-left vertex of the rectangle. The curve then sharply increases to follow the boundary of the dashed rectangle, until at the bottom-right vertex it flattens to continue following the horizontal axis to the edge of the graph. This figure is a graph with horizontal axis ω, ranging in value from -π to π, and vertical axis |H(π)|. There is one dashed rectangle with its base sitting on the horizontal axis, and  with its width measured from horizontal position -π/2N to π/2N. The height is not measured or labeled. There is also a solid curve that, in a calmly wavelike distortion follows the horizontal axis to the bottom-left vertex of the rectangle. The curve then sharply increases to follow the boundary of the dashed rectangle, until at the bottom-right vertex it flattens to continue following the horizontal axis to the edge of the graph.
    Ideal (dashed) and typical (solid) prototype-filter magnitude responses for the cosine-modulated filterbank. Note bandwidth relative to [link] .
    This figure is comprised of two cartesian graphs. Both graphs show waves plotted with radians on the horizontal axis and magnitude [dB] on the vertical axis. The first graph is titled prototype filter. The vertical values on this graph range from -120 to 20, and the horizontal values from 0 to 3. The graph begins at nearly a vertical value of 20, immediately falling into a series of nonuniform waves of varying amplitudes and wavelengths in no distinct pattern. There are perhaps one hundred of these waves, never reaching a vertical value again higher than -80, and continuing to the right side of the graph. The second graph is titled composite system. The vertical values range from -2 x 10^-4 to 8 x 10^-4. The horizontal values range from 0 to 3. The waves in this graph follow a rigid, predictable pattern. They have extremely short wavelengths and there are perhaps 150 waves occurring across the page. The waves are centered around a vertical value of 3, and follow a repeating amplitude pattern of 3.2, 3.1, 3.1, 3.2, 2.5. This figure is comprised of two cartesian graphs. Both graphs show waves plotted with radians on the horizontal axis and magnitude [dB] on the vertical axis. The first graph is titled prototype filter. The vertical values on this graph range from -120 to 20, and the horizontal values from 0 to 3. The graph begins at nearly a vertical value of 20, immediately falling into a series of nonuniform waves of varying amplitudes and wavelengths in no distinct pattern. There are perhaps one hundred of these waves, never reaching a vertical value again higher than -80, and continuing to the right side of the graph. The second graph is titled composite system. The vertical values range from -2 x 10^-4 to 8 x 10^-4. The horizontal values range from 0 to 3. The waves in this graph follow a rigid, predictable pattern. They have extremely short wavelengths and there are perhaps 150 waves occurring across the page. The waves are centered around a vertical value of 3, and follow a repeating amplitude pattern of 3.2, 3.1, 3.1, 3.2, 2.5.
    Magnitude response of | H ( ω ) | of MPEG prototype filter and the resulting composite response | Q ( ω ) | , where N = 32 and M = 16 N = 512 .
    To conclude, [link] (fourth equation) and [link] give impulse response expressions for a set of real-valued filters that comprise a near-perfectlyreconstructing filterbank (under suitable selection of { h i } ). This is commonly referred to The MPEG standards refer to this filterbank as a “polyphase quadrature” filterbank (PQF), the name given to the technique byan early technical paper: Rothweiler ICASSP 83 as a “cosine-modulated filterbank” because all filters are based on cosine modulations of areal-valued linear-phase lowpass prototype H ( z ) . The near-perfect reconstruction property follows from the frequency-domaincancellation of adjacent-spectrum aliasing and the lack of phase distortion.
    It should be noted that our derivation of the cosine modulated filterbank is similar to that in Rothweiler ICASSP 83 except for the treatmentsof phase distortion. See Chapter 8 of Vaidyanathan for a more comprehensiveview of cosine-modulated filterbanks.
  • Polyphase Implementations: Recall the uniformly modulated filterbank in Figure 4 from "Uniformly-Modulated Filterbanks" , whose combined modulator-filter coefficients can be constructed using products of the terms h n and e j π N i n . Figure 5 from "Uniformly-Modulated Filterbanks" shows a computationally-efficient polyphase/DFT implementation of the analysis filter which requires only M multiplies and one N -dimensional DFT computation for calculation of N subband outputs.We might wonder: Is there a similar polyphase/fast-transform implementation of the cosine-modulated filterbank derived inthis section? From [link] (fourth equation), we see that the impulse responses of { H i ( z ) } are products of the terms h n and cos π 2 i + 1 2 N n - M + N - 1 2 for n = 0 , , M - 1 . Note that the inverse-DCT matrix C n t can be specified via components with form similar to the cosine term in [link] (fourth equation):
    C N t i , n = 2 N α n cos π ( 2 i + 1 ) 2 N n ; i , n = 0 N - 1 . for α 0 = 1 / 2 , α n 0 = 1 .
    Thus it may not be surprising that there exist polyphase/DCT implementations of the cosine-modulated filterbank.Indeed, one such implementation is specified in the MPEG-2 audio compression standard (see ISO/IEC 13818-3).This particular implementation is the focus of the next section.

Mpeg filterbank implementation

  • Since MPEG audio compression standards are so well-known and widespread, a detailed look at the MPEG filterbank implementation is warranted.The cosine-modulated, or polyphase-quadrature filterbank described in the previous section is used in MPEG Layers 1-3.(The MPEG hierarchy will be described in a later chapter.) This section discusses the specific implementation suggested by theMPEG-2 standard (see ISO/IEC 13818-3).
  • The MPEG standard specifies 512 prototype filter coefficients, the first of which is zero.To adapt the MPEG filter to our cosine-modulated-filterbank framework, we append a zero-valued 513 th coefficient so that the resulting MPEG prototype filter becomes symmetric and hence linear phase.Since the standard specifies N = 32 frequency bands, we have
    M = 513 = 16 N + 1 .
    Plugging this value of M into the filter expressions [link] (fourth equation) and [link] , the 2 π -periodicity of the cosine implies that they may be rewritten as follows.
    H i ( z ) = n = 0 16 N - 1 2 cos π 2 i + 1 2 N n - N 2 h n impulse response of H i ( z ) z - n K i ( z ) = n = 0 16 N - 1 2 cos π 2 i + 1 2 N n + N 2 h n impulse response of K i ( z ) z - n .
  • Encoding: Here we derive the encoder filterbank implementation suggested in theMPEG-2 standard (see ISO/IEC 13818-3). Using x i ( n ) to denote the output of the i t h analysis filter, we have
    x i ( n ) = k = 0 16 N - 1 2 cos π 2 i + 1 2 N ( k - N 2 h k ] x ( n - k ) .
    The relationship between x i ( n ) and its downsampled version s i ( m ) is given by
    s i ( m ) = x i ( m N ) ,
    so that the downsampled analysis output s i ( m ) can be written as
    s i ( m ) = n = 0 16 N - 1 2 cos π 2 i + 1 2 N ( n - N 2 h n ] x ( m N - n ) .
    Using the substitution n = k N + for 0 N - 1 ,
    s i ( m ) = 2 k = 0 15 = 0 N - 1 cos π 2 i + 1 2 N k N + - N 2 repeats every 4 increments of k sign changes every 2 increments of k h k N + x ( m - k ) N - = k = 0 15 = 0 N - 1 cos π 2 i + 1 2 N k 2 N + - N 2 repeats every 2 increments of k 2 ( - 1 ) k / 2 h k N + analysis window x ( m - k ) N -
    [link] illustrates this process.
    This is a complex flowchart with general downward movement of a number of rows of labeled, connected rectangles, all pointing down at a row of circles containing plus signs, which point down at a large box titled cosine matrix transformation. The columns of these connected rectangles are labeled across from k = 0 to k = 16. The first row of  boxes contain labels across from x(mN),... to x(m-16)N +1. The first set of arrows pointing down to the next set of rectangles are all labeled with a * sign. The second row of boxes are labeled across from h_0, ..., h_N-1 to h_15N, ... , h_16N-1. This is followed by another set of arrows with asterisks. The next row of  rectangles contain a series of the number 2 or the number -2. Below this are arrows labeled =. Below these arrows are more boxes that contain 13 hash marks inside. From different hash marks are longer arrows that point at the different circles containing plus signs, which in turn all point at the large cosine matrix transformation box. The positions at which these circles point at the box are labeled across from j = 0 to j = 2N-1. On the right side of the box are a series of equations from top to bottom, i = 0 to i = N - 1. To the right of each of these equations is an arrow pointing to the right at the variables from top to bottom s_0(m) to s_N-1(m). This is a complex flowchart with general downward movement of a number of rows of labeled, connected rectangles, all pointing down at a row of circles containing plus signs, which point down at a large box titled cosine matrix transformation. The columns of these connected rectangles are labeled across from k = 0 to k = 16. The first row of  boxes contain labels across from x(mN),... to x(m-16)N +1. The first set of arrows pointing down to the next set of rectangles are all labeled with a * sign. The second row of boxes are labeled across from h_0, ..., h_N-1 to h_15N, ... , h_16N-1. This is followed by another set of arrows with asterisks. The next row of  rectangles contain a series of the number 2 or the number -2. Below this are arrows labeled =. Below these arrows are more boxes that contain 13 hash marks inside. From different hash marks are longer arrows that point at the different circles containing plus signs, which in turn all point at the large cosine matrix transformation box. The positions at which these circles point at the box are labeled across from j = 0 to j = 2N-1. On the right side of the box are a series of equations from top to bottom, i = 0 to i = N - 1. To the right of each of these equations is an arrow pointing to the right at the variables from top to bottom s_0(m) to s_N-1(m).
    MPEG encoder filterbank implementation suggested in ISO/IEC 13818-3.
  • Decoding: Here we derive the dencoder filterbank implementation suggested in theMPEG-2 standard (see ISO/IEC 13818-3). Using y i ( n ) to denote the output of the i t h upsampler,
    u i ( n ) = k = 0 16 N - 1 2 cos π 2 i + 1 2 N k + N 2 h k y i ( n - k ) .
    The input to the upsampler s i ( m ) is related to the output y i ( n ) by
    y i ( n ) = s i ( n / N ) when n / N Z 0 else ,
    so that
    u i ( n ) = { k : n - k N Z } 2 cos π 2 i + 1 2 N k + N 2 h k s i n - k N .
    Lets write n = m N + for 0 N - 1 and k = p N + q for 0 q N - 1 . Then due to the restricted ranges of and q ,
    n - k N = m - p + - q N Z = q .
    Using these substitutions in the previous equation for u i ( n ) ,
    u i ( m N + ) = 2 p = 0 15 cos π 2 i + 1 2 N p N + + N 2 h p N + s i ( m - p ) .
    Summing u i ( m N + ) over i to create u ( m N + ) ,
    u ( m N + ) = 2 i = 0 N - 1 p = 0 15 cos π 2 i + 1 2 N p N + + N 2 ) repeats every 4 increments of p sign changes every 2 increments of p h p N + s i ( m - p ) = p = 0 15 2 ( - 1 ) p / 2 h p N + synthesis window i = 0 N - 1 cos π 2 i + 1 2 N p 2 N + + N 2 ) = cos π 2 i + 1 2 N + N 2 p even cos π 2 i + 1 2 N + N + N 2 p odd s i ( m - p )
    If we define
    v j ( m ) = i = 0 N - 1 cos π 2 i + 1 2 N j + N 2 s i ( m ) for 0 j 2 N - 1 ,
    (note the range of j !) then we can rewrite
    u ( m N + ) = p = 0 , 2 , , 14 ( - 1 ) p / 2 h p N + v ( m - p ) + p = 1 , 3 , , 15 ( - 1 ) p / 2 h p N + v + N ( m - p ) .
    [link] illustrates the construction of u ( m N + ) using the notation
    v ( m ) = v 0 ( m ) v 2 N - 1 ( m ) .
    This is another complex flowchart that generally moves in the reverse direction of the encoder flowchart in figure 25. The chart begins with the large cosine matrix transformation box, containing the same labels, then points down at a row of connected boxes labeled from v(m) to v(m - 15). Below these boxes are a series of shaded, but unlabeled boxes. Below these are the arrows with asterisks, which point at boxes  containing the h_subscript labels. Below these are more asterisk arrows, which point at the boxes with the series of 2's or -2's. Below these are the equal sign arrows which point down at the boxes with the hash marks. From the hash marks are arrows that point at each circle containing a plus sign, and from each circle there is an arrow pointing down at a final single rectangle containing six hash marks (8 including the sides of the rectangle) numbered from 0 to N. This is another complex flowchart that generally moves in the reverse direction of the encoder flowchart in figure 25. The chart begins with the large cosine matrix transformation box, containing the same labels, then points down at a row of connected boxes labeled from v(m) to v(m - 15). Below these boxes are a series of shaded, but unlabeled boxes. Below these are the arrows with asterisks, which point at boxes  containing the h_subscript labels. Below these are more asterisk arrows, which point at the boxes with the series of 2's or -2's. Below these are the equal sign arrows which point down at the boxes with the hash marks. From the hash marks are arrows that point at each circle containing a plus sign, and from each circle there is an arrow pointing down at a final single rectangle containing six hash marks (8 including the sides of the rectangle) numbered from 0 to N.
    MPEG decoder filterbank implementation suggested in ISO/IEC 13818-3.
  • DCT Implementation of Cosine Matrixing: As seen in [link] and [link] , the filterbank implementations suggested by the MPEG standard requirea cosine matrix operation that, if implemented using straightforward arithmetic, requires 32 × 64 = 2048 multiply/adds at both the encoder and decoder.Note, however, that the cosine transformations in [link] and [link] do bear a great deal of similarity to the DCT:
    y k = 2 N α k n = 0 N - 1 x n cos π 2 n + 1 2 N k ; k = 0 N - 1 , for α 0 = 1 / 2 , α k 0 = 1 , x n = 2 N k = 0 N - 1 α k y k cos π 2 n + 1 2 N k ; n = 0 N - 1 ,
    which we know has a fast algorithm: Lee's 32 × 32 fast-DCT, for example, requires only 80 multiplications and 209 additions (see B.G.Lee TASSP Dec 84).So how do we implement the matrix operation using the fast-DCT? A technique has been described clearly in Konstantinides SPL 1994,the results of which are summarized below. At the encoder, the matrix operation can be written
    s i ( m ) = j = 0 2 N - 1 cos π 2 i + 1 2 N j - N 2 w j ( m ) for i = 0 , , N - 1 ,
    where { w 0 ( m ) , , w 2 N - 1 ( m ) is created from { x ( m ) , , x ( m - 16 N + 1 ) } by windowing, shifting, and adding. (See [link] .) We can write
    s i ( m ) = j = 0 N - 1 cos π 2 i + 1 2 N j w ¯ j ( m ) ; i = 0 , , N - 1 ,
    where, for N = 32 , { w ¯ j ( m ) } is the following manipulation of { w j ( m ) } :
    w ¯ j ( m ) : = w 16 ( m ) j = 0 w 16 + j ( m ) + w 16 - j ( m ) j = 1 , 2 , , 16 w 16 + j ( m ) - w 80 - j ( m ) j = 17 , 18 , , 31 .
    Compare [link] to the inverse DCT in [link] (lower equation). At the decoder, the matrix operation can be written
    v j ( m ) = i = 0 N - 1 cos π 2 i + 1 2 N j + N 2 s i ( m ) for j = 0 , , 2 N - 1 ,
    where { v 0 ( m ) , , v 2 N - 1 ( m ) } are windowed, shifted, and added to compute { u ( m ) } . (See [link] .) It is shown in Konstantinides SPL 1994 that, for N = 32 , { v j ( m ) } can be calculated by first computing { v ¯ j ( m ) } :
    v ¯ j ( m ) = i = 0 N - 1 cos π 2 i + 1 2 N j s i ( m ) ; j = 0 , , N - 1
    and rearranging the outputs according to
    v j ( m ) : = v ¯ j + 16 ( m ) j = 0 , 1 , , 15 , 0 j = 16 , - v ¯ 48 - j ( m ) j = 17 , 18 , , 47 , - v ¯ j - 48 ( m ) j = 48 , 49 , , 63 .
    Compare [link] to the DCT in [link] (upper equation).

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, An introduction to source-coding: quantization, dpcm, transform coding, and sub-band coding. OpenStax CNX. Sep 25, 2009 Download for free at http://cnx.org/content/col11121/1.2
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'An introduction to source-coding: quantization, dpcm, transform coding, and sub-band coding' conversation and receive update notifications?

Ask