<< Chapter < Page Chapter >> Page >
R ( f ^ n ) R ( f * ) + ϵ .

If an inequality of this form holds, then we say that f ^ n is a Probability Approximately Correct (PAC) decision rule [link] .

To show that the empirical risk minimizer is a PAC decision rule, we first must understand how closely the empirical risk matches the truerisk. First, let us consider the empirical and true risk of the decision rule f . Assume that the loss function is bounded between 0 and 1 (possibly after a suitable normalization). Then theempirical risk function is a sum of independent random variables bounded between 0 and 1. Hoeffding's inequality is a bound on thedeviations of such random sums from their corresponding mean values [link] . In this case, the mean value is the true risk of f , and Hoeffding's inequality states that

P ( | R ^ ( f ) - R ( f ) | > ϵ ) 2 e - 2 n ϵ 2 .

Another equivalent statement is that the inequality | R ^ ( f ) - R ( f ) | ϵ holds with probability at least 1 - 2 e - 2 n ϵ 2 . Thus, the two risks are probably close together, and the greater the number of training examples, n , the closer they are.

Now we would like a similar condition to hold for all f F , since ERM optimizes over the entire collection F . Suppose that F is a finite collection of decision rules. Let | F | denote the number of rules in F . The probability that the difference between the true and empirical risks, ofone or more of the decision rules, exceeds ϵ is bounded by the sum of the probabilities of each individual event of the form | R ^ ( f ) - R ( f ) | > ϵ , the so-called Union of Events bound. Therefore, with probability at least 1 - | F | 2 e - 2 n ϵ 2 we have that

| R ^ ( f ) - R ( f ) | ϵ

for all f F . Equivalently, setting δ = 2 | F | e - 2 n ϵ 2 , we have that with probability at least 1 - δ and for all f F

| R ^ ( f ) - R ( f ) | log | F | + log ( 2 / δ ) 2 n .

Notice that the two risks are uniformly close together, and the closeness indicated by the boundincreases as n increases and decreases as the number of decision rules in F increases. In fact, the bound scales with log | F | , and so it is reasonable to interpret the logarithm of the number of decision rules under consideration as a measure ofthe complexity of the class.

Now using this bound, we can show that f ^ n is a PAC decision rule as follows. Note that with probability at least 1 - δ

R ( f ^ n ) R ^ ( f ^ n ) + log | F | + log ( 2 / δ ) 2 n R ^ ( f * ) + log | F | + log ( 2 / δ ) 2 n R ( f * ) + 2 log | F | + log ( 2 / δ ) 2 n

where the first inequality follows since the true and empirical risks are close for all f F , and in particular for f ^ n , the second inequality holds since by definition f ^ n minimizes the empirical risk, and the third inequality holds again since the empirical risk is close to the true risk for all f , in this case for f * in particular. So, we have shown that f ^ n is PAC.

PAC bounds of this form can be extended in many directions, for example to infinitely large or uncountable classes of decision rules,but the basic ingredients of the theory are essentially like those demonstrated above. The bottom line is that empirical riskminimization is a reasonable approach, provided one has access to a sufficient number of training examples and the number, or more generally the complexity, of the class of decision rules underconsideration is not too great.

Further reading

Excellent treatments of classical decision and estimation theory can be found in a number of textbooks [link] , [link] , [link] , [link] . For references on statistical learning theory, outstanding textbooks are also available [link] , [link] , [link] for further reading.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Statistical learning theory. OpenStax CNX. Apr 10, 2009 Download for free at http://cnx.org/content/col10532/1.3
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Statistical learning theory' conversation and receive update notifications?

Ask