<< Chapter < Page Chapter >> Page >

Lower performance bounds

In other modules, estimators/predictors are analyzed, in order to obtain upper bounds on their performance. These bounds are of the form:

min f F E [ d ( f ^ n , f ) ] C n - γ

where γ > 0 . We would like to know if these bounds are tight, in the sense that there is no otherestimator that is significantly better. To answer this, we need lower bounds like

inf f ^ n sup f F E [ d ( f ^ n , f ) ] c n - γ

We assume we have the following ingredients:

  • Class of models, F S . F is a class of models containing the “true" model and is a subset of some bigger class S . E.g. F could be the class of Lipschitz density functions or distributions P X Y satisfying the box-counting condition.
  • An observation model, P f , indexed by f F . P f denotes the distribution of the data under model f . E.g. in regression and classification, this is the distribution of Z = ( X 1 , Y 1 , , X n , Y n ) Z . We will assume that P f is a probability measure on the measurable space ( Z , B ) .
  • A performance metric d ( . , . ) . 0 . If you have a model estimate f ^ n , then the performance of that model estimate relative to the true model f is d ( f ^ n , f ) . E.g.
    Regression: d ( f ^ n , f ) = | | f ^ n - f | | 2 = ( f ^ n ( x ) - f ( x ) ) 2 d x 1 / 2
    Classification: d ( f ^ n , f ) = R ( G ^ n ) - R * = G ^ n Δ G * | 2 η ( x ) - 1 | d P X ( x )

As before, we are interested in the risk of a learning rule, in particular the maximal risk given as:

sup f F E f [ d ( f ^ n , f ) ] = sup f F d ( f ^ n ( Z ) , f ) d P f ( Z )

where f ^ n is a function of the observations Z and E f denotes the expectation with respect to P f .

The main goal is to get results of the form

R n * = Δ inf f ^ n sup f F E [ d ( f ^ n , f ) ] c s n

where c > 0 and s n 0 as n . The inf is taken over all estimators, i.e. all measurable functions f ^ n : Z S .

Suppose we have shown that

lim inf n s n - 1 R n * c > 0 (A lower bound)

and also that for a particular estimator f ¯ n

lim sup n s n - 1 sup f F E f [ d ( f ¯ n , f ) ] C
lim sup n s n - 1 R n * C ,

We say that s n is the optimal rate of convergence for this problem and that f ¯ n attains that rate.

Two rates of convergence Ψ n and Ψ n ' are equivalent, i.e. Ψ n Ψ n ' iff
0 < lim inf n Ψ n Ψ n ' lim sup n Ψ n Ψ n ' <

General reduction scheme

Instead of directly bounding the expected performance, we are going to prove stronger probability bounds of the form

inf f ^ n sup f F P f ( d ( f ^ n , f ) s n ) c > 0

These bounds can be readily converted to expected performance bounds using Markov's inequality:

P f ( d ( f ^ n , f ) s n ) E f [ d ( f ^ n , f ) ] s n

Therefore it follows:

inf f ^ n sup f F E f [ d ( f ^ n , f ) ] inf f ^ n sup f F s n P f ( d ( f ^ n , f ) s n ) c s n

First reduction step

Reduce the original problem to an easier one by replacing the larger class F with a smaller finite class { f 0 , , f M } F . Observe that

inf f ^ n sup f F P f ( d ( f ^ n , f ) s n ) inf f ^ n sup f { f 0 , , f M } P f ( d ( f ^ n , f ) s n )

The key idea is to choose a finite collection of models such that the resulting problem is as hard as the original, otherwise the lower bound will not be tight.

Second reduction step

Next, we reduce the problem to a hypotheses test. Ideally, we would like to have something like

inf f ^ n sup f F P f ( d ( f ^ n , f ) s n ) inf f ^ n sup j { 0 , , M } P f j ( h ^ n ( Z ) j )

The inf is over all measurable test functions

h ^ n : Z { 0 , , M }

and P f j ( h ^ n ( Z ) j ) denotes the probability that after observing the data, the test infers the wrong hypothesis.

This might not always be true or easy to show, but in certain scenarios it can be done. Suppose d ( . , . ) is a semi-distance, i.e. it satisfies

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Statistical learning theory. OpenStax CNX. Apr 10, 2009 Download for free at http://cnx.org/content/col10532/1.3
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Statistical learning theory' conversation and receive update notifications?

Ask