21.04.2014 Views

1h6QSyH

1h6QSyH

1h6QSyH

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

224 6. Linear Model Selection and Regularization<br />

Mean Squared Error<br />

0 20 40 60 80 100<br />

Mean Squared Error<br />

0 20 40 60 80 100<br />

0.02 0.10 0.50 2.00 10.00 50.00<br />

λ<br />

0.4 0.5 0.6 0.7 0.8 0.9 1.0<br />

R 2 on Training Data<br />

FIGURE 6.9. Left: Plots of squared bias (black), variance (green), and test MSE<br />

(purple) for the lasso. The simulated data is similar to that in Figure 6.8, except<br />

that now only two predictors are related to the response. Right: Comparison of<br />

squared bias, variance and test MSE between lasso (solid) and ridge (dashed).<br />

Both are plotted against their R 2 on the training data, as a common form of<br />

indexing. The crosses in both plots indicate the lasso model for which the MSE is<br />

smallest.<br />

function of only 2 out of 45 predictors. Now the lasso tends to outperform<br />

ridge regression in terms of bias, variance, and MSE.<br />

These two examples illustrate that neither ridge regression nor the lasso<br />

will universally dominate the other. In general, one might expect the lasso<br />

to perform better in a setting where a relatively small number of predictors<br />

have substantial coefficients, and the remaining predictors have coefficients<br />

that are very small or that equal zero. Ridge regression will perform better<br />

when the response is a function of many predictors, all with coefficients of<br />

roughly equal size. However, the number of predictors that is related to the<br />

response is never known apriorifor real data sets. A technique such as<br />

cross-validation can be used in order to determine which approach is better<br />

on a particular data set.<br />

As with ridge regression, when the least squares estimates have excessively<br />

high variance, the lasso solution can yield a reduction in variance<br />

at the expense of a small increase in bias, and consequently can generate<br />

more accurate predictions. Unlike ridge regression, the lasso performs<br />

variable selection, and hence results in models that are easier to interpret.<br />

There are very efficient algorithms for fitting both ridge and lasso models;<br />

in both cases the entire coefficient paths can be computed with about the<br />

same amount of work as a single least squares fit. We will explore this<br />

further in the lab at the end of this chapter.<br />

A Simple Special Case for Ridge Regression and the Lasso<br />

In order to obtain a better intuition about the behavior of ridge regression<br />

and the lasso, consider a simple special case with n = p, andX a diagonal<br />

matrix with 1’s on the diagonal and 0’s in all off-diagonal elements.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!