0.20 Lower performance bounds for estimators

Statistical learning theory Page 1 / 4

Lower performance bounds

In other modules, estimators/predictors are analyzed, in order to obtain upper bounds on their performance. These bounds are of the form:

min_{f \in F} E [d ({\hat{f}}_{n}, f)] \leq C n^{- γ}

where $γ > 0$ . We would like to know if these bounds are tight, in the sense that there is no otherestimator that is significantly better. To answer this, we need lower bounds like

inf_{{\hat{f}}_{n}} sup_{f \in F} E [d ({\hat{f}}_{n}, f)] \geq c n^{- γ}

We assume we have the following ingredients:

* Class of models, $F \subseteq S$ . $F$ is a class of models containing the “true" model and is a subset of some bigger class $S$ . E.g. $F$ could be the class of Lipschitz density functions or distributions $P_{X Y}$ satisfying the box-counting condition.
* An observation model, $P_{f}$ , indexed by $f \in F$ . $P_{f}$ denotes the distribution of the data under model $f$ . E.g. in regression and classification, this is the distribution of $Z = (X_{1}, Y_{1}, \dots, X_{n}, Y_{n}) \subseteq Z$ . We will assume that $P_{f}$ is a probability measure on the measurable space $(Z, B)$ .
* A performance metric $d (., .) . \geq 0$ . If you have a model estimate ${\hat{f}}_{n}$ , then the performance of that model estimate relative to the true model $f$ is $d ({\hat{f}}_{n}, f)$ . E.g.
$Regression: d ({\hat{f}}_{n}, f) = | | {\hat{f}}_{n} - f {| |}_{2} = {(\int {({\hat{f}}_{n} (x) - f (x))}^{2} d x)}^{1 / 2}$

$Classification: d ({\hat{f}}_{n}, f) = R ({\hat{G}}_{n}) - R^{*} = \int_{{\hat{G}}_{n} Δ G^{*}} | 2 η (x) - 1 | d P_{X} (x)$

As before, we are interested in the risk of a learning rule, in particular the maximal risk given as:

sup_{f \in F} E_{f} [d ({\hat{f}}_{n}, f)] = sup_{f \in F} \int d ({\hat{f}}_{n} (Z), f) d P_{f} (Z)

where ${\hat{f}}_{n}$ is a function of the observations $Z$ and $E_{f}$ denotes the expectation with respect to $P_{f}$ .

The main goal is to get results of the form

R_{n}^{*} \overset{Δ}{=} inf_{{\hat{f}}_{n}} sup_{f \in F} E [d ({\hat{f}}_{n}, f)] \geq c s_{n}

where $c > 0$ and $s_{n} \to 0$ as $n \to \infty$ . The $inf$ is taken over all estimators, i.e. all measurable functions ${\hat{f}}_{n} : Z \to S$ .

Suppose we have shown that

lim inf_{n \to \infty} s_{n}^{- 1} R_{n}^{*} \geq c > 0 (A lower bound)

and also that for a particular estimator ${\bar{f}}_{n}$

lim sup_{n \to \infty} s_{n}^{- 1} sup_{f \in F} E_{f} [d ({\bar{f}}_{n}, f)] \leq C

\Rightarrow lim sup_{n \to \infty} s_{n}^{- 1} R_{n}^{*} \leq C,

We say that $s_{n}$ is the optimal rate of convergence for this problem and that ${\bar{f}}_{n}$ attains that rate.

Two rates of convergence

Ψ_{n}

and

Ψ_{n}^{'}

are equivalent, i.e.

Ψ_{n} \equiv Ψ_{n}^{'}

iff

0 < lim inf_{n \to \infty} \frac{Ψ_{n}}{Ψ_{n}^{'}} \leq lim sup_{n \to \infty} \frac{Ψ_{n}}{Ψ_{n}^{'}} < \infty

General reduction scheme

Instead of directly bounding the expected performance, we are going to prove stronger probability bounds of the form

inf_{{\hat{f}}_{n}} sup_{f \in F} P_{f} (d ({\hat{f}}_{n}, f) \geq s_{n}) \geq c > 0

These bounds can be readily converted to expected performance bounds using Markov's inequality:

P_{f} (d ({\hat{f}}_{n}, f) \geq s_{n}) \leq \frac{E_{f} [d ({\hat{f}}_{n}, f)]}{s_{n}}

Therefore it follows:

inf_{{\hat{f}}_{n}} sup_{f \in F} E_{f} [d ({\hat{f}}_{n}, f)] \geq inf_{{\hat{f}}_{n}} sup_{f \in F} s_{n} P_{f} (d ({\hat{f}}_{n}, f) \geq s_{n}) \geq c s_{n}

First reduction step

Reduce the original problem to an easier one by replacing the larger class $F$ with a smaller finite class ${f_{0}, \dots, f_{M}} \subseteq F$ . Observe that

inf_{{\hat{f}}_{n}} sup_{f \in F} P_{f} (d ({\hat{f}}_{n}, f) \geq s_{n}) \geq inf_{{\hat{f}}_{n}} sup_{f \in {f_{0}, \dots, f_{M}}} P_{f} (d ({\hat{f}}_{n}, f) \geq s_{n})

The key idea is to choose a finite collection of models such that the resulting problem is as hard as the original, otherwise the lower bound will not be tight.

Second reduction step

Next, we reduce the problem to a hypotheses test. Ideally, we would like to have something like

inf_{{\hat{f}}_{n}} sup_{f \in F} P_{f} (d ({\hat{f}}_{n}, f) \geq s_{n}) \geq inf_{{\hat{f}}_{n}} sup_{j \in {0, \dots, M}} P_{f_{j}} ({\hat{h}}_{n} (Z) \neq j)

The $inf$ is over all measurable test functions

{\hat{h}}_{n} : Z \to {0, \dots, M}

and $P_{f_{j}} ({\hat{h}}_{n} (Z) \neq j)$ denotes the probability that after observing the data, the test infers the wrong hypothesis.

This might not always be true or easy to show, but in certain scenarios it can be done. Suppose $d (., .)$ is a semi-distance, i.e. it satisfies

<< Chapter < Page Page > Chapter >>

Read also:

Get Jobilize Job Search Mobile App in your pocket Now!

100% Free Mobile Applications
Receive real-time job alerts and never miss the right job again

Source: OpenStax, Statistical learning theory. OpenStax CNX. Apr 10, 2009 Download for free at http://cnx.org/content/col10532/1.3

Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Statistical learning theory' conversation and receive update notifications?

Ask

	19 AP 19 Cardiovascular System Heart MCQ By OpenStax Start Quiz
	Financial Markets By Keyaira Braxton Start Exam
	Does your crush like you? By Hoy Wen Start Quiz
	Introduction to sociology 2e By OpenStax Read Online Course
	27 Biology 27 Animal Diversity MCQ By OpenStax Start Quiz
	Clinical Issues in TB Management By Cath Yu Start Quiz
	25 AP Key Terms 25 The Urinary System By OpenStax Start Key Terms
	Tournament Director Quiz By Eddie Unverzagt Start Quiz
	Western Political Thought MCQ By Saylor Foundation Start Quiz
	28 AP 28 Development Inheritance Essay By OpenStax Start Flashcards