<< Chapter < Page Chapter >> Page >

Suppose f is piecewise Lipschitz and f k ia a piecewise constant.

| f ( t ) - f k ( t ) | Δ

where Δ is a constant equal to average of f on right and left side of discontinuity in this interval.

| | f - f k | | L 2 2 = O ( k - 1 )

where k - 1 is the width of the interval. Notice this rate is quite slow.

This problem naturally suggests the following remedy: use very small intervals near discontinuities and larger intervals insmooth regions. Specifically, suppose we use intervals of width k - 2 α to contain the discontinuities and the intervals ofwidth k - 1 elsewhere. Then accordingly piecewise polynomial approximation f ˜ k satisfies

| | f - f ˜ k | | L 2 2 = O ( k - 2 α ) .

We can accomplish this need for "adaptive resolution" or "multiresolution" using recursive partitions and trees.

Recursive dyadic partitions

We discussed this idea already in our examination of classification trees. Here is the basic idea again, graphically.

Complete and pruned RDP along with their correspnding tree structures.

Consider a function f B α ( C α ) that contains no more than m points of discontinuity, and is H α ( C α ) away from these points.

Lemma

Consider a complete RDP with n intervals, then there exists anassociated pruned RDP with O ( k l o g n ) intervals, such that an associated piecewise degree α polynomial approximation ( ˜ f ) k , has a squared approximation error of O ( m i n ( k - 2 α , n - 1 ) ) .

Assume n > k > m . Divide [ 0 , 1 ] into k intervals. If f is smooth on a particular interval I , then

| f ( t ) - f ˜ k ( t ) | = O ( k - 2 α ) t I .

In intervals that contain a discontinuity, recursively subdivide into two until the discontinuity is contained in an interval ofwidth n - 1 . This process results in at most l o g 2 n addition subintervals per discontinuity, and the squared approximationerror is O ( k - 2 α ) on all of them accept the m intervals of width n - 1 containing the discontinuities where the error is O ( 1 ) at each point.

Thus, the overall squared L 2 norm is

| | f - f ˜ k | | L 2 2 = O ( m i n ( k - 2 α , n - 1 ) )

and there are at most k + l o g 2 n intervals in the partition. Since k>m, we can upperbound the number of intervals by 2 k l o g 2 n .

Note that if the initial complete RDP has n k 2 α intervals, then the squared error is O ( k - 2 α ) .

Thus, we only incur a factor of 2 α l o g k additional leafs and achieve the same overall approximation error as in the H α ( C α ) case. We will see that this is a small price to pay in order to handle not only smooth functions, but alsopiecewise smooth functions.

Wavelet approximations

Let f L 2 ( [ 0 , 1 ] ) ; f 2 ( t ) d t < .

A wavelet approximation is a series of the form

f = c o + j 0 k = 1 2 j < f , ψ j , k > ψ j , k

where c o is a constant ( c o = 0 1 f ( t ) d t ) ,

< f , ψ j , k > = 0 1 f ( t ) ψ j , k ( t ) d t

and the basis functions ψ j , k are orthonormal, oscillatory signals, each with an associated scale 2 - j and position k 2 - j . ψ j , k is called the wavelet at scale 2 - j and position k 2 - j .

Haar wavelets

ψ j , k ( t ) = 2 j / 2 ( 1 { t [ 2 - j ( k - 1 ) , 2 - j ( k - 1 / 2 ) ] } - 1 { t [ 2 - j ( k - 1 / 2 ) , 2 - j k ] } )
Haar Wavelet
0 1 ψ j , k ( t ) d t = 0
0 1 ψ j , k 2 ( t ) d t = ( k - 1 ) 2 - j k 2 - j 2 j d t = 1
0 1 ψ j , k ( t ) ψ l , m ( t ) d t = δ j , l . δ k , m
If f is constant on [ 2 - j ( k - 1 ) , 2 - j k ] , then
f ψ j , k ( t ) = 0 .

Suppose f is piecewise constant with at most m discontinuities. Let

f J = c o + j = 0 J - 1 k = 1 2 j < f , ψ j , k > ψ j , k .

Then, f J has at most m J non-zero wavelet coefficients; i.e., < f , ψ j , k > = 0 for all but m J terms, since at most one Haar Wavelet at each scale senses each point of discontinuity. Said another way, allbut at most m of the wavelets at each scale have support over constant regions of f .

f J itself will be piecewise constant with discontinuities only possible occurring at end points of the intervals [ 2 - J ( k - 1 ) , 2 - J k ] . Therefore, in this case

| | f - f J | | L 2 2 = O ( 2 - J ) .

Daubechies wavelets are the extension of the Haar wavelet idea. Haar wavelets have one "vanishing moment":

0 1 ψ j , k = 0 .

Daubechies wavelets are "smoother" basis functions with extra vanishing moments. The Daubechies- N wavelet has N vanishing moments.

0 1 t l ψ j , k d t = 0 f o r l = 0 , 1 , . . . , N - 1 .

The Daubechies-1 wavelet is just the Haar case.

If f is a piecewise degree N polynomial with at most m pieces, then using the Daubechies- N wavelet system.

| | f - f J | | L 2 2 = O ( 2 - J ) ;

and

f J ( t ) = c o + j = 0 J - 1 k = 1 2 j < f , ψ j , k > ψ j , k ( t )

has at most O ( m J ) non-zero wavelet coefficients. f J is called the Discrete Wavelet Transform (DWT) approximation of f . The key idea is the same as we saw with trees.

Sampled data

We can also use DWT's to analyze and represent discrete, sampled functions. Suppose,

f ̲ = [ f ( 1 / n ) , f ( 2 / n ) , . . . , f ( n / n ) ]

then we can write f ̲ as

f ̲ = c o + j = 0 l o g 2 n - 1 k = 1 2 j < f ̲ , ψ ̲ j , k > ψ ̲ j , k

where

ψ ̲ j , k = [ ψ j , k ( 1 ) , ψ j , k ( 2 ) , . . . , ψ j , k ( n ) ]

is a discrete time analog of the continuous time wavelets we considered before. In particular,

i = 1 n i l ψ j , k ( i ) = 0 , l = 0 , 1 , . . . , N - 1

for the Daubechies- N discrete wavelets.

< f ̲ , ψ ̲ j , k > = f ̲ T ψ ̲ j , k

Thus, we also have an analogous approximation result: If f ̲ are samples from a piecewise degree N polynomial function with a finite number m of discontinuities, then f ̲ has O ( m J ) non-zero wavelet coefficients.

ApproximatingFunctions with wavelets

Suppose f B α ( C α ) and has a finite number of discontinuities. Let f p denote piecewise degree- N ( N = α ) polynomial approximation to f with O ( k ) pieces; a uniform partition into k equal length intervals followed by addition splits at the points of discontinuity.

Then

| f ( t ) - f p ( t ) | 2 = O ( k ( - 2 α ) ) t [ 0 , 1 ]
| f ( i / n ) - f p ( i / n ) | 2 = O ( k - 2 α ) i = 1 , . . . , n
1 / n | | f ̲ - f ̲ p | | L 2 2 = O ( k - 2 α ) )

and f ̲ p has O ( k l o g 2 n ) non-zero coefficients according to our previous analysis.

Wavelets in 2-d

Suppose f is a 2-D image that is piecewise polynomial:

A pruned RDP of k squares decorated with polyfits gives

| | f - f k | | L 2 2 = O ( k - 1 ) .

Let f ̲ = [ f ( i / k , j / k ) i , j = 1 n sample range.

f n ( t ) = i , j = 1 k f ( i / k , j / k k ) 1 { t [ i - 1 / k , i / k ) x [ j - 1 / k , j / k ) }

then

| | f - f n | | L 2 2 = O ( k - 1 )

O ( 1 ) error on k of the k 2 pixels, near zero elsewhere. The DWT of f ̲ has O ( k ) non-zero wavelet coefficients. O ( 2 j ) at scale 2 - j , j = 0 , 1 , . . . , l o g n .

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Statistical learning theory. OpenStax CNX. Apr 10, 2009 Download for free at http://cnx.org/content/col10532/1.3
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Statistical learning theory' conversation and receive update notifications?

Ask