<< Chapter < Page Chapter >> Page >
  • d ( f , g ) = d ( g , f ) 0 (Symmetric)
  • d ( f , f ) = 0
  • d ( f , g ) d ( h , f ) + d ( h , g ) (Triangle inequality)

E.g. with f , g : R d R , d ( f , g ) = Δ | | f - g | | 2 .

Lemma

Suppose d ( . , . ) is a semi-distance. Also suppose that we have constructed f 0 , , f M s.t. d ( f j , f k ) 2 s n , j k . Take any estimator f ^ n and define the test: Ψ * f ^ n : Z { 0 , , M } as

Ψ * ( f ^ n ) = arg min j d ( f ^ n , f j )

Then Ψ * ( f ^ n ) j , implies d ( f ^ n , f j ) s n .

Suppose Ψ * ( f ^ n ) j k j : d ( f ^ n , f k ) d ( f ^ n , f j ) . Now

2 s n d ( f j , f k ) d ( f ^ n , f j ) + d ( f ^ n , f k ) 2 d ( f ^ n , f j )
d ( f ^ n , f j ) s n

The previous lemma implies that

P f j ( d ( f ^ n , f j ) s n ) P f j ( Ψ * ( f ^ n ) j )

Therefore,

inf f ^ n sup f F P f j ( d ( f ^ n , f j ) s n ) inf f ^ n max f { f 0 , , f M } P f j ( d ( f ^ n , f j ) s n ) inf f ^ n max j { 0 , , M } P f j ( Ψ * ( f ^ n ) j ) inf h ^ n max j { 0 , , M } P j ( h ^ n j ) = Δ P e , M

The third step follows since we are replacing the class of tests defined by Ψ * ( f ^ n ) by a larger class of ALL possible tests h ^ n , and hence the inf taken over the larger class is smaller.

Now our goal throughout is going to be to find lower bounds for P e , M .

So we need to construct f 0 , , f M s.t. d ( f j , f k ) 2 s n , j k and P e , M c > 0 . Observe that this requires careful construction since the first condition necessitates that f j and f k are far from each other, while the second condition requires that f j and f k are close enough so that it is harder to distinguish them based on a given sample of data, and hence the probability of error P e , M is bounded away from 0.

We now try to lower bound the probability of error P e , M . We first consider the case M = 1 , corresponding to binary hypothesis testing.

M = 1: Let P 0 and P 1 denote the two probability measures, i.e. distributions of the data under models 0 and 1. Clearly if P 0 and P 1 are very “close", then it is hard to distinguish the two hypotheses, and so P e , 1 is large.

A natural measure between probability measures is the total variation , defined as:

V ( P 0 , P 1 ) = sup A | P 0 ( A ) - P 1 ( A ) | = sup A | A p 0 ( Z ) - p 1 ( Z ) d ν ( Z ) |

where p 0 and p 1 are the densities of P 0 and P 1 with respect to a common dominating measure ν and A is any subset of the domain. We will lower bound the probability of error P e , 1 using the total variation distance. But first, we establish the following lemma.

Lemma

Scheffe's lemma

V ( P 0 , P 1 ) = 1 2 | p 0 ( Z ) - p 1 ( Z ) | d ν ( Z ) = 1 2 | p 0 - p 1 | = 1 - min ( p 0 , p 1 )

Recall the definition of the total variation distance:

V ( P 0 , P 1 ) = sup A | A p 0 - p 1 |

Observe that the set A maximizing the right hand side is given by either { Z Z : p 0 ( Z ) p 1 ( Z ) } or { Z Z : p 1 ( Z ) p 0 ( Z ) } .

Let us pick A 0 = { Z Z : p 0 ( Z ) p 1 ( Z ) } . Then

V ( P 0 , P 1 ) = A 0 p 0 - p 1 = - A 0 c p 0 - p 1 = 1 2 | p 0 - p 1 |

For the second part, notice that

p 0 ( Z ) - min ( p 0 ( Z ) , p 1 ( Z ) ) = 0 if p 0 ( Z ) p 1 ( Z ) p 0 ( Z ) - p 1 ( Z ) if p 0 ( Z ) p 1 ( Z )

Now consider

1 - min ( p 0 , p 1 ) = p 0 ( Z ) - min ( p 0 ( Z ) , p 1 ( Z ) ) = A 0 p 0 ( Z ) - p 1 ( Z ) d ν ( Z ) = V ( P 0 , P 1 )

We are now ready to tackle the lower bound on P e , 1 . In this case, we consider all tests h ^ n ( Z ) : Z { 0 , 1 } . Equivalently, we can define h ^ n ( Z ) = 1 A ( Z ) , where A is any subset of the domain.

P e , 1 = inf h ^ n max j { 0 , , M } P j ( h ^ n j ) inf h ^ n 1 2 P 0 ( h ^ n 0 ) + P 1 ( h ^ n 1 ) = 1 2 inf A P 0 ( 1 A ( Z ) 0 ) + P 1 ( 1 A ( Z ) 1 ) = 1 2 inf A P 0 ( A ) + P 1 ( A c ) = 1 2 inf A 1 - ( P 1 ( A ) - P 0 ( A ) ) = 1 2 ( 1 - V ( P 0 , P 1 ) )

So if P 0 is close to P 1 , then V ( P 0 , P 1 ) is small and the probability of error P e , 1 is large.

This is interesting, but unfortunately, it is hard to work with total variation, especially for multivariate distributions. Bounds involving the Kullback-Leibler divergence are much more convenient.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Statistical learning theory. OpenStax CNX. Apr 10, 2009 Download for free at http://cnx.org/content/col10532/1.3
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Statistical learning theory' conversation and receive update notifications?

Ask