0.7 Chernoff's bound and hoeffding's inequality (Page 2/2)

Statistical learning theory Page 2 / 2

Take

Z = | \hat{R_{n} (f)} - R (f) | and t = ϵ

\begin{matrix} P (| \hat{R_{n}} (f) - R (f) | \geq ϵ) & \leq & \frac{E [| \hat{R_{n} (f)} - R (f) |^{2}]}{ϵ^{2}} \\ \leq & \frac{var ({\hat{R}}_{n} (f))}{ϵ^{2}} \\ = & \frac{\sum_{i = 1}^{n} var (\frac{L_{i}}{n})}{ϵ^{2}} \\ = & \frac{var (ℓ (X), Y)}{n ϵ^{2}} \\ = & \frac{σ_{L}^{2}}{n ϵ^{2}} \end{matrix} .

So, the probability goes to zero at a rate of at least $n^{- 1}$ . However, it turns out that this is an extremely loose bound. Accordingto the Central Limit Theorem

\hat{R_{n}} (f) = \frac{1}{n} \sum_{i = 1}^{n} L_{i} \to N (R (f), \frac{σ_{L}^{2}}{n}) as n \to \infty

in distribution. This suggests that for large values of n,

P (| \hat{R_{n}} (f) - R (f) | \geq ϵ) \approx O (e^{- \frac{n ϵ^{2}}{2 σ_{L}^{2}}}) .

That is, the Gaussian tail probability is tending to zero exponentially fast.

Chernoff's bound

Note that for any nonnegative random variable $Z$ and $t > 0$ ,

P (Z \geq t) = P (e^{s Z} \geq e^{s t}) \leq \frac{E [e^{s Z}]}{e^{s t}}, \forall s > 0 by Markov's inequality .

Chernoff's bound is based on finding the value of $s$ that minimizes the upper bound. If $Z$ is a sum of independent random variables. For example, say

Z = \sum_{i = 1}^{n} (ℓ (f (X_{i}), Y_{i}) - R (f)) = n ({\hat{R}}_{n} (f) - R (f))

then the bound becomes

P (\sum_{i = 1}^{n} (L_{i} - E [L_{i}]) \geq t) \leq e^{- s t} E [e^{s \sum_{i = 1}^{n} (L_{i} - E [L_{i}])}] \leq e^{- s t} \prod_{i = 1}^{n} E [e^{s (L_{i} - E [L_{i}])}], from independence.

Thus, the problem of finding a tight bound boils down to finding a good bound for $E [s^{s (L_{i} - E [L_{i}])}]$ . Chernoff ('52), first studied this situation for binary random variables. Then,Hoeffding ('63) derived a more general result for arbitrary bounded random variables.

Hoeffding's indequality

Theorem

Hoeffding's inequality

Let $Z_{1}, Z_{2}, . . ., Z n$ be independent bounded random variables such that $Z_{i} \in [a_{i}, b_{i}]$ with probability 1. Let $S_{n} = \sum_{i = 1}^{n} Z_{i}$ . Then for any $t > 0$ , we have

P (| S_{n} - E [S_{n}] | \geq t) \leq 2 e^{- \frac{2 t^{2}}{\sum_{i = 1}^{n} {(b_{i} - a_{i})}^{2}}} .

The key to proving Hoeffding's inequality is the following upper bound: if $Z$ is a random variable with $E [Z] = 0$ and $a \leq Z \leq b,$ then

E [e^{s Z}] \leq e^{\frac{s^{2} {(b - a)}^{2}}{8}} .

This upper bound is derived as follows. By the convexity of theexponential function,

e^{s z} \leq \frac{z - a}{b - a} e^{s b} + \frac{b - z}{b - a} e^{s a}, for a \leq z \leq b .

Thus,

\begin{matrix} E [e^{s Z}] & \leq & E [\frac{Z - a}{b - a}] e^{s b} + E [\frac{b - Z}{b - a}] e^{s a} \\ = & \frac{b}{b - a} e^{s a} - \frac{a}{b - a} e^{s b}, since E [Z] = 0 \\ = & (1 - θ + θ e^{s (b - a)}) e^{- θ s (b - a)}, where θ = \frac{- a}{b - a} \end{matrix} .

Now let

u = s (b - a) and define φ (u) \equiv - θ u + log (1 - θ + θ e^{u}) .

Then we have

E [e^{s Z}] \leq (1 - θ + θ e^{s (b - a)}) e^{- θ s (b - a)} = e^{φ (u)} .

To minimize the upper bound let's express $φ (u)$ in a Taylor's series with remainder :

φ (u) = φ (0) + u φ^{'} (0) + \frac{u^{2}}{2} φ^{''} (v) for some v \in [0, u]

\begin{matrix} φ^{'} (u) & = & - θ + \frac{θ e^{u}}{1 - θ + θ e^{u}} \Rightarrow φ^{'} (u) = 0 \\ φ^{''} (u) & = & \frac{θ e^{u}}{1 - θ + θ e^{u}} - \frac{{(θ e^{u})}^{2}}{{(1 - θ + θ e^{u})}^{2}} \\ = & \frac{θ e^{u}}{1 - θ + θ e^{u}} (1 - \frac{θ e^{u}}{1 - θ + θ e^{u}}) \\ = & ρ (1 - ρ) \end{matrix} .

Now, $φ^{''} (u)$ is maximized by

ρ = \frac{θ e^{u}}{1 - θ + θ e^{u}} = \frac{1}{2} \Rightarrow φ^{''} (u) \leq \frac{1}{4} .

So,

φ (u) \leq \frac{u^{2}}{8} = \frac{s^{2} {(b - a)}^{2}}{8}

\Rightarrow E [e^{s Z}] \leq e^{\frac{s^{2} {(b - a)}^{2}}{8}} .

Now, we can apply this upper bound to derive Hoeffding's inequality.

\begin{matrix} P (S_{n} - E [S_{n}] \geq t) & \leq & e^{- s t} \prod_{i = 1}^{n} E [e^{s (L_{i} - E [L_{i}])}] \\ \leq & e^{- s t} \prod_{i = 1}^{n} e^{\frac{s^{2} {(b_{i} - a_{i})}^{2}}{8}} \\ = & e^{- s t} e^{s^{2} \sum_{i = 1}^{n} \frac{{(b_{i} - a_{i})}^{2}}{8}} \\ = & e^{\frac{- 2 t^{2}}{\sum_{i = 1}^{n} {(b_{i} - a_{i})}^{2}}} \\ by choosing s = \frac{4 t}{\sum_{i = 1}^{n} {(b_{i} - a_{i})}^{2}} \end{matrix}

Similarly, $P (E [S_{n}] - S_{n} \geq t) \leq e^{\frac{- 2 t^{2}}{\sum_{i = 1}^{n} {(b_{i} - a_{i})}^{2}}}$ . This completes the proof of the Hoeffding's theorem.

Application

Let

Z_{i} = 1_{f (X_{i}) \neq Y_{i}} - R (f),

as in the classification problem. Then for a fixed f, it follows fromHoeffding's inequality (i.e., Chernoff's bound in this special case) that

\begin{matrix} P (| \hat{R_{n}} (f) - R (f) | \geq ϵ) & = & P (\frac{1}{n} | S_{n} - E [S_{n}] | \geq ϵ) \\ = & P (| S_{n} - E [S_{n}] | \geq n ϵ) \\ \leq & 2 e^{- \frac{2 {(n ϵ)}^{2}}{n}} \\ = & 2 e^{- 2 n ϵ^{2}} \end{matrix} .

Now, we want a bound like this to hold uniformly for all $f \in F$ . Assume that $F$ is a finite collection of models and let $| F |$ denote its cardinality. We would like to bound the probability that ${max}_{f \in F} | \hat{R_{n}} (f) - R (f) | \geq ϵ$ . Note that the event

\{max_{f \in F} | \hat{R_{n}} (f) - R (f) | \geq ϵ\} \equiv \{⋃_{f \in F} | \hat{R_{n}} (f) - R (f) | \geq ϵ\} .

Therefore

\begin{matrix} P (max_{f \in F} | \hat{R_{n}} (f) - R (f) | \geq ϵ) & = & P (⋃_{f \in F} | \hat{R_{n}} (f) - R (f) | \geq ϵ) \\ \leq & \sum_{f \in F} P (| \hat{R_{n}} (f) - R (f) | \geq ϵ), the `` union of events'' bound \\ \leq & 2 | F | e^{- 2 n ϵ^{2}}, by Hoeffding's inequality. \end{matrix}

Thus, we have shown that with probability at least $1 - 2 | F | e^{- 2 n ϵ^{2}}$ , $\forall f \in F$

| \hat{R_{n}} (f) - R (f) | < ϵ .

And accordingly, we can be reasonably confident in selecting $f$ from $F$ based on the empirical risk function ${\hat{R}}_{n}$ .

<< Chapter < Page Page > Chapter >>

Read also:

Get Jobilize Job Search Mobile App in your pocket Now!

100% Free Mobile Applications
Receive real-time job alerts and never miss the right job again

Source: OpenStax, Statistical learning theory. OpenStax CNX. Apr 10, 2009 Download for free at http://cnx.org/content/col10532/1.3

Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Statistical learning theory' conversation and receive update notifications?

Ask

©flickr:	Word Roots and Prefixes By Ellie Banfield Start Quiz
	14 AP 14 Brain Cranial Nerves MCQ By OpenStax Start Quiz
	10 Physiotherapy Modalities-Thermo By Rhodes Start Quiz
	2 Biology 02 The Chemical Foundation of Life MCQ By OpenStax Start Quiz
©flickr: anjelkam	Art By Caitlyn Gobble Start Exam
	Chemistry By OpenStax Read Online Course
©flickr: U.S.	Biology Chapter 9 By Michael Sag Start Exam
	17 AP 17 Endocrine System Essay By OpenStax Start Flashcards
	3 CDL Quiz - General Knowledge Part 1 By Jazzycazz Jackson Start Quiz
	3 Pharmacology Excl. Nervous System MCQ By Rohini Ajay Start Quiz