<< Chapter < Page Chapter >> Page >

We define our estimator as

f ^ n H = f ^ n k ^ ,

where

f ^ n ( k ) = arg min f F k H R ^ n ( f ) ,

and

k ^ = arg min k 1 R ^ n ( f ^ n ( k ) + ( k + k 2 ) log 2 + 1 2 log n 2 n .

Therefore f ^ n H minimizes

R ^ n ( f ) + c H ( f ) log 2 + 1 2 log n 2 n ,

over all f F H . We showed before that

E [ R ( f ^ n H ) ] - R * min f F H R ( f ) - R * + c H ( f ) log 2 + 1 2 log n 2 n + 1 n .

To proceed with our analysis we need to make some assumptions on the intrinsic difficulty of the problem. We will assume that theBayes decision boundary is a “well-behaved” 1-dimensional set, in the sense that it has box-counting dimension one (seeAppendix  "Box Counting Dimension" ). This implies that, for an histogram with k 2 bins, the Bayes decision boundary intersects less than C k bins, where C is a constant that does not depend on k . Furthermore we assume that the marginal distribution of X satisfies P X ( A ) K | A | , for any measurable subset A [ 0 , 1 ] 2 . This means that the samples collected do not accumulate anywhere in the unit square.

Under the above assumptions we can conclude that

min f F k H R ( f ) - R * K k 2 C k = C K k .

Therefore

E [ R ( f ^ n H ) ] - R * C K / k + ( k + k 2 ) log 2 + 1 2 log n 2 n + 1 n .

We can balance the terms in the right side of the above expression using k = n 1 / 4 (for n large) therefore

E [ R ( f ^ n H ) ] - R * = O ( n - 1 / 4 ) , as n .

Dyadic decision trees

Now let's consider the dyadic decision trees, under the assumptions above, and contrast these with the histogram classifier. Let

F k T = { tree classifiers with k leafs } .

Let F T = k 1 F k T . We can prefix encode each element f of F T with c T ( f ) = 3 k - 1 bits, as described before.

Let

f ^ n T = f ^ n ( k ^ ) ,

where

f ^ n ( k ) = arg min f F k T R ^ n ( f ) ,

and

k ^ = arg min k 1 R ^ n ( f ^ n ( k ) + ( 3 k - 1 ) log 2 + 1 2 log n 2 n .

Hence f ^ n T minimizes

R ^ n ( f ) + c T ( f ) log 2 + 1 2 log n 2 n ,

over all f F T . Moreover

E [ R ( f ^ n T ) ] - R * min f F T R ( f ) - R * + c T ( f ) log 2 + 1 2 log n 2 n + 1 n .

If the Bayes decision boundary is a 1-dimensional set, as in "Histogram Risk Bound" , there exists a tree with at most 8 C k leafs such that the boundary is contained in at most C k squares, each of volume 1 / k 2 . To see this, start with a tree yielding the histogram partition with k 2 boxes ( i.e., the tree partitioning the unit square into k 2 equal sized squares). Now prune all the nodes that do not intersect the boundary. In [link] we illustrate the procedure. If you carefully bound the number of leafs you need at each level you canshow that you will have in total less than 8 C k leafs. We conclude then that there exists a tree with at most 8 C k leafs that has the same risk as a histogram with O ( k 2 ) bins. Therefore, using [link] we have

E [ R ( f ^ n T ) ] - R * C K / k + ( 3 ( 8 C k ) - 1 ) log 2 + 1 2 log n 2 n + 1 n .

We can balance the terms in the right side of the above expression using k = n 1 / 3 (for n large) therefore

E [ R ( f ^ n T ) ] - R * = O ( n - 1 / 3 ) , as n .
Illustration of the tree pruning procedure: (a) Histogram classification rule, for a partition with 16 bins, andcorresponding binary tree representation (with 16 leafs). (b) Pruned version of the histogram tree, yielding exactly the sameclassification rule, but now requiring only 6 leafs. ( Note: The trees where constructed using the procedure ofFigure )

Final comments

Trees generally work much better than histogram classifiers. This is essentially because they provide much more efficient ways ofapproximating the Bayes decision boundary (as we saw in our example, under reasonable assumptions on the Bayes boundary, a tree encodedwith O ( k ) bits can describe the same classifier as an histogram that requires O ( k 2 ) bits).

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Statistical learning theory. OpenStax CNX. Apr 10, 2009 Download for free at http://cnx.org/content/col10532/1.3
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Statistical learning theory' conversation and receive update notifications?

Ask