<< Chapter < Page Chapter >> Page >
ε ^ ( h i ) = 1 m j = 1 m Z j .

Thus, ε ^ ( h i ) is exactly the mean of the m random variables Z j that are drawn iid from a Bernoulli distribution with mean ε ( h i ) . Hence, we can apply the Hoeffding inequality, and obtain

P ( | ε ( h i ) - ε ^ ( h i ) | > γ ) 2 exp ( - 2 γ 2 m ) .

This shows that, for our particular h i , training error will be close to generalization error with high probability, assuming m is large. But we don't just want to guarantee that ε ( h i ) will be close to ε ^ ( h i ) (with high probability) for just only one particular h i . We want to prove that this will be true for simultaneously for all h H . To do so, let A i denote the event that | ε ( h i ) - ε ^ ( h i ) | > γ . We've already show that, for any particular A i , it holds true that P ( A i ) 2 exp ( - 2 γ 2 m ) . Thus, using the union bound, we have that

P ( h H . | ε ( h i ) - ε ^ ( h i ) | > γ ) = P ( A 1 A k ) i = 1 k P ( A i ) i = 1 k 2 exp ( - 2 γ 2 m ) = 2 k exp ( - 2 γ 2 m )

If we subtract both sides from 1, we find that

P ( ¬ h H . | ε ( h i ) - ε ^ ( h i ) | > γ ) = P ( h H . | ε ( h i ) - ε ^ ( h i ) | γ ) 1 - 2 k exp ( - 2 γ 2 m )

(The “ ¬ ” symbol means “not.”) So, with probability at least 1 - 2 k exp ( - 2 γ 2 m ) , we have that ε ( h ) will be within γ of ε ^ ( h ) for all h H . This is called a uniform convergence result, because this is a bound that holds simultaneously for all (as opposed to just one) h H .

In the discussion above, what we did was, for particular values of m and γ , give a bound on the probability that for some h H , | ε ( h ) - ε ^ ( h ) | > γ . There are three quantities of interest here: m , γ , and the probability of error; we can bound either one in terms of the other two.

For instance, we can ask the following question: Given γ and some δ > 0 , how large must m be before we can guarantee that with probability at least 1 - δ , training error will be within γ of generalization error? By setting δ = 2 k exp ( - 2 γ 2 m ) and solving for m , [you should convince yourself this is the right thing to do!], we find that if

m 1 2 γ 2 log 2 k δ ,

then with probability at least 1 - δ , we have that | ε ( h ) - ε ^ ( h ) | γ for all h H . (Equivalently, this shows that the probability that | ε ( h ) - ε ^ ( h ) | > γ for some h H is at most δ .) This bound tells us how many training examples we need in order makea guarantee. The training set size m that a certain method or algorithm requires in order to achieve a certain level of performance is also calledthe algorithm's sample complexity .

The key property of the bound above is that the number of training examples needed to make this guarantee is only logarithmic in k , the number of hypotheses in H . This will be important later.

Similarly, we can also hold m and δ fixed and solve for γ in the previous equation, and show [again, convince yourself that this is right!]that with probability 1 - δ , we have that for all h H ,

| ε ^ ( h ) - ε ( h ) | 1 2 m log 2 k δ .

Now, let's assume that uniform convergence holds, i.e., that | ε ( h ) - ε ^ ( h ) | γ for all h H . What can we prove about the generalization of our learning algorithm that picked h ^ = arg min h H ε ^ ( h ) ?

Define h * = arg min h H ε ( h ) to be the best possible hypothesis in H . Note that h * is the best that we could possibly do given that we are using H , so it makes sense to compare our performance to that of h * . We have:

ε ( h ^ ) ε ^ ( h ^ ) + γ ε ^ ( h * ) + γ ε ( h * ) + 2 γ

The first line used the fact that | ε ( h ^ ) - ε ^ ( h ^ ) | γ (by our uniform convergence assumption). The second used the fact that h ^ was chosen to minimize ε ^ ( h ) , and hence ε ^ ( h ^ ) ε ^ ( h ) for all h , and in particular ε ^ ( h ^ ) ε ^ ( h * ) . The third line used the uniform convergence assumption again, to show that ε ^ ( h * ) ε ( h * ) + γ . So, what we've shown is the following: If uniform convergence occurs,then the generalization error of h ^ is at most 2 γ worse than the best possible hypothesis in H !

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask