0.20 Lower performance bounds for estimators (Page 2/4)

Statistical learning theory Page 2 / 4

(i) $d (f, g) = d (g, f) \geq 0$ (Symmetric)
(ii)
$d (f, f) = 0$
(iii) $d (f, g) \leq d (h, f) + d (h, g)$ (Triangle inequality)

E.g. with $f, g : R^{d} \to R, d (f, g) \overset{Δ}{=} | | f - {g | |}_{2}$ .

Lemma

Suppose $d (., .)$ is a semi-distance. Also suppose that we have constructed $f_{0}, \dots, f_{M}$ s.t. $d (f_{j}, f_{k}) \geq 2 s_{n}$ , $\forall j \neq k$ . Take any estimator ${\hat{f}}_{n}$ and define the test: $Ψ^{*} \circ {\hat{f}}_{n} : Z \to {0, \dots, M}$ as

Ψ^{*} ({\hat{f}}_{n}) = arg min_{j} d ({\hat{f}}_{n}, f_{j})

Then $Ψ^{*} ({\hat{f}}_{n}) \neq j$ , implies $d ({\hat{f}}_{n}, f_{j}) \geq s_{n}$ .

Suppose $Ψ^{*} ({\hat{f}}_{n}) \neq j ⟺ \exists k \neq j : d ({\hat{f}}_{n}, f_{k}) \leq d ({\hat{f}}_{n}, f_{j})$ . Now

2 s_{n} \leq d (f_{j}, f_{k}) \leq d ({\hat{f}}_{n}, f_{j}) + d ({\hat{f}}_{n}, f_{k}) \leq 2 d ({\hat{f}}_{n}, f_{j})

\Rightarrow d ({\hat{f}}_{n}, f_{j}) \geq s_{n}

The previous lemma implies that

P_{f_{j}} (d ({\hat{f}}_{n}, f_{j}) \geq s_{n}) \geq P_{f_{j}} (Ψ^{*} ({\hat{f}}_{n}) \neq j)

Therefore,

\begin{matrix} inf_{{\hat{f}}_{n}} sup_{f \in F} P_{f_{j}} (d ({\hat{f}}_{n}, f_{j}) \geq s_{n}) & \geq & inf_{{\hat{f}}_{n}} max_{f \in {f_{0}, \dots, f_{M}}} P_{f_{j}} (d ({\hat{f}}_{n}, f_{j}) \geq s_{n}) \\ \geq & inf_{{\hat{f}}_{n}} max_{j \in {0, \dots, M}} P_{f_{j}} (Ψ^{*} ({\hat{f}}_{n}) \neq j) \\ \geq & inf_{{\hat{h}}_{n}} max_{j \in {0, \dots, M}} P_{j} ({\hat{h}}_{n} \neq j) \\ \overset{Δ}{=} & P_{e, M} \end{matrix}

The third step follows since we are replacing the class of tests defined by $Ψ^{*} ({\hat{f}}_{n})$ by a larger class of ALL possible tests ${\hat{h}}_{n}$ , and hence the $inf$ taken over the larger class is smaller.

Now our goal throughout is going to be to find lower bounds for $P_{e, M}$ .

So we need to construct $f_{0}, \dots, f_{M}$ s.t. $d (f_{j}, f_{k}) \geq 2 s_{n}$ , $j \neq k$ and $P_{e, M} \geq c > 0$ . Observe that this requires careful construction since the first condition necessitates that $f_{j}$ and $f_{k}$ are far from each other, while the second condition requires that $f_{j}$ and $f_{k}$ are close enough so that it is harder to distinguish them based on a given sample of data, and hence the probability of error $P_{e, M}$ is bounded away from 0.

We now try to lower bound the probability of error $P_{e, M}$ . We first consider the case $M = 1$ , corresponding to binary hypothesis testing.

M = 1: Let $P_{0}$ and $P_{1}$ denote the two probability measures, i.e. distributions of the data under models 0 and 1. Clearly if $P_{0}$ and $P_{1}$ are very “close", then it is hard to distinguish the two hypotheses, and so $P_{e, 1}$ is large.

A natural measure between probability measures is the total variation , defined as:

V (P_{0}, P_{1}) = sup_{A} | P_{0} (A) - P_{1} (A) | = sup_{A} | \int_{A} p_{0} (Z) - p_{1} (Z) d ν (Z) |

where $p_{0}$ and $p_{1}$ are the densities of $P_{0}$ and $P_{1}$ with respect to a common dominating measure $ν$ and $A$ is any subset of the domain. We will lower bound the probability of error $P_{e, 1}$ using the total variation distance. But first, we establish the following lemma.

Lemma

Scheffe's lemma

\begin{matrix} V (P_{0}, P_{1}) & = & \frac{1}{2} \int | p_{0} (Z) - p_{1} (Z) | d ν (Z) = \frac{1}{2} \int | p_{0} - p_{1} | \\ = & 1 - \int min (p_{0}, p_{1}) \end{matrix}

Recall the definition of the total variation distance:

V (P_{0}, P_{1}) = sup_{A} | \int_{A} p_{0} - p_{1} |

Observe that the set $A$ maximizing the right hand side is given by either ${Z \in Z : p_{0} (Z) \geq p_{1} (Z)}$ or ${Z \in Z : p_{1} (Z) \geq p_{0} (Z)}$ .

Let us pick $A_{0} = {Z \in Z : p_{0} (Z) \geq p_{1} (Z)}$ . Then

V (P_{0}, P_{1}) = \int_{A_{0}} p_{0} - p_{1} = - \int_{A_{0}^{c}} p_{0} - p_{1} = \frac{1}{2} \int | p_{0} - p_{1} |

For the second part, notice that

p_{0} (Z) - min (p_{0} (Z), p_{1} (Z)) = \{\begin{matrix} 0 & if p_{0} (Z) \leq p_{1} (Z) \\ p_{0} (Z) - p_{1} (Z) & if p_{0} (Z) \geq p_{1} (Z) \end{matrix})

Now consider

1 - \int min (p_{0}, p_{1}) = \int p_{0} (Z) - min (p_{0} (Z), p_{1} (Z)) = \int_{A_{0}} p_{0} (Z) - p_{1} (Z) d ν (Z) = V (P_{0}, P_{1})

We are now ready to tackle the lower bound on $P_{e, 1}$ . In this case, we consider all tests ${\hat{h}}_{n} (Z) : Z \to {0, 1}$ . Equivalently, we can define ${\hat{h}}_{n} (Z) = 1_{A} (Z)$ , where $A$ is any subset of the domain.

\begin{matrix} P_{e, 1} = inf_{{\hat{h}}_{n}} max_{j \in {0, \dots, M}} P_{j} ({\hat{h}}_{n} \neq j) & \geq & inf_{{\hat{h}}_{n}} (\frac{1}{2} P_{0} ({\hat{h}}_{n} \neq 0) + P_{1} ({\hat{h}}_{n} \neq 1)) \\ = & \frac{1}{2} inf_{A} P_{0} (1_{A} (Z) \neq 0) + P_{1} (1_{A} (Z) \neq 1) \\ = & \frac{1}{2} inf_{A} P_{0} (A) + P_{1} (A^{c}) \\ = & \frac{1}{2} inf_{A} 1 - (P_{1} (A) - P_{0} (A)) \\ = & \frac{1}{2} (1 - V (P_{0}, P_{1})) \end{matrix}

So if $P_{0}$ is close to $P_{1}$ , then $V (P_{0}, P_{1})$ is small and the probability of error $P_{e, 1}$ is large.

This is interesting, but unfortunately, it is hard to work with total variation, especially for multivariate distributions. Bounds involving the Kullback-Leibler divergence are much more convenient.

<< Chapter < Page Page > Chapter >>

Read also:

Get Jobilize Job Search Mobile App in your pocket Now!

100% Free Mobile Applications
Receive real-time job alerts and never miss the right job again

Source: OpenStax, Statistical learning theory. OpenStax CNX. Apr 10, 2009 Download for free at http://cnx.org/content/col10532/1.3

Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Statistical learning theory' conversation and receive update notifications?

Ask

	Young Economist MCQ Test By Robert Murphy Start Test
	19 AP 19 Cardiovascular System Heart Essay By OpenStax Start Flashcards
	4 AP 04 Tissue Level of Organization Essay By OpenStax Start Flashcards
	10 Neuroanatomy 10 Corticospinal Tract By Stephen Voron Start Quiz
	3 AP 03 Cellular Level of Organization MCQ By OpenStax Start Quiz
	32 Biology 32 Plant Reproduction MCQ By OpenStax Start Quiz
	5 Sociology 05 Socialization MCQ By OpenStax Start Quiz
	NCE Ch 08 Appraisal By Anh Dao Start Quiz
	23 Biology 23 Protists MCQ By OpenStax Start Quiz
	27 AP 27 Reproductive System Essay By OpenStax Start Flashcards