2.4 Machine learning lecture 5 course notes (Page 3/4)

Machine learning Page 3 / 4

Feature selection

One special and important case of model selection is called feature selection. To motivate this, imagine that you havea supervised learning problem where the number of features $n$ is very large (perhaps $n ≫ m$ ), but you suspect that there is only a small number of features that are “relevant” to thelearning task. Even if you use a simple linear classifier (such as theperceptron) over the $n$ input features, the VC dimension of your hypothesis class would still be $O (n)$ , and thus overfitting would be a potential problem unless thetraining set is fairly large.

In such a setting, you can apply a feature selection algorithm to reduce the number of features. Given $n$ features, there are $2^{n}$ possible feature subsets (since each of the $n$ features can either be included or excluded from the subset), and thus feature selection can beposed as a model selection problem over $2^{n}$ possible models. For large values of $n$ , it's usually too expensive to explicitly enumerate over and compare all $2^{n}$ models, and so typically some heuristic search procedure is used to find agood feature subset. The following search procedure is called forward search :

Initialize $F = \emptyset$ .
Repeat {
1. For $i = 1, ..., n$ if $i \notin F$ , let $F_{i} = F \cup {i}$ , and use some version of cross validation to evaluate features $F_{i}$ . (i.e., train your learning algorithm using only the features in $F_{i}$ , and estimate its generalization error.)
2. Set $F$ to be the best feature subset found on step (a).
$}$
Select and output the best feature subset that was evaluated during the entire search procedure.

The outer loop of the algorithm can be terminated either when $F = {1, ..., n}$ is the set of all features, or when $| F |$ exceeds some pre-set threshold (corresponding to the maximum number of features that you want the algorithmto consider using).

This algorithm described above one instantiation of wrapper model feature selection , since it is a procedure that “wraps” around your learning algorithm,and repeatedly makes calls to the learning algorithm to evaluate how well it does using different feature subsets. Aside from forwardsearch, other search procedures can also be used. For example, backward search starts off with $F = {1, ..., n}$ as the set of all features, and repeatedly deletes features one at a time (evaluating single-featuredeletions in a similar manner to how forward search evaluates single-feature additions) until $F = \emptyset$ .

Wrapper feature selection algorithms often work quite well, but can be computationally expensive given how that they need to makemany calls to the learning algorithm. Indeed, complete forward search (terminating when $F = {1, ..., n}$ ) would take about $O (n^{2})$ calls to the learning algorithm.

Filter feature selection methods give heuristic, but computationally much cheaper, ways of choosing a feature subset.The idea here is to compute some simple score $S (i)$ that measures how informative each feature $x_{i}$ is about the class labels $y$ . Then, we simply pick the $k$ features with the largest scores $S (i)$ .

One possible choice of the score would be define $S (i)$ to be (the absolute value of) the correlation between $x_{i}$ and $y$ , as measured on the training data. This would result in our choosing the features that are the moststrongly correlated with the class labels. In practice, it is more common (particularly for discrete-valued features $x_{i}$ ) to choose $S (i)$ to be the mutual information $MI (x_{i}, y)$ between $x_{i}$ and $y$ :

<< Chapter < Page Page > Chapter >>

Read also:

Get Jobilize Job Search Mobile App in your pocket Now!

100% Free Mobile Applications
Receive real-time job alerts and never miss the right job again

Source: OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4

Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask

	10 Lec:10 Sampling Confidence Intervals By Janet Forrester Start Quiz
	27 AP 27 Reproductive System Essay By OpenStax Start Flashcards
	37 Biology 37 The Endocrine System MCQ By OpenStax Start Quiz
	Are you a 5sos fan? By Rylee Minllic Start Quiz
	26 Biology 26 Seed Plants MCQ By OpenStax Start Quiz
©flickr: Dutch	Las Vegas Timeshare By Donyea Sweets Start Test
	Principles of microeconomics for ap® courses By OpenStax Read Online Course
	19 AP 19 Cardiovascular System Heart MCQ By OpenStax Start Quiz
	Unit 1 Geography Test By Qqq Qqq Start Flashcards
	Principles of Marketing By Dionne Mahaffey Start Quiz