Statistics How To

Forward Selection: Definition

Regression Analysis >

Forward selection is a type of stepwise regression which begins with an empty model and adds in variables one by one. In each forward step, you add the one variable that gives the single best improvement to your model.

It is one of two commonly used methods of stepwise regression; the other is backward elimination, and is almost opposite. In that, you start with a model that includes every possible variable and eliminate the extraneous variables one by one.

General Method Behind Forward Selection

Forward selection typically begins with only an intercept. One tests the various variables that may be relevant, and the ‘best’ variable—where “best” is determined by some pre-determined criteria—is added to the model.

As the model continues to improve (per that same criteria) we continue the process, adding in one variable at a time and testing at each step. Once the model no longer improves with adding more variables, the process stops.

The criterion used to determine which variable goes in when are varied. You could be attempting to find the lowest score under cross validation, the lowest p-value, or any of a number of other tests or measures of accuracy.

Since stepwise regression tends toward over-fitting, which happens when we put in more variables than is actually good for the model; it typically shows a very close, neat fit of the data used in regression, but the model will be far off from additional data points and not good for interpolation. Therefore, it is usually good to have strict criteria for adding in any variables.

References

Brant, Rollin. Forward Selection. MDSC 643.02 Lecture Materials. Retrieved from
https://www.stat.ubc.ca/~rollin/teach/643w04/lec/node41.html on July 7, 2018
Cook, Perry. Stepwise Selection. Human-Computer Interface Technology (CS436) Class Notes. Retrieved from
https://www.cs.princeton.edu/courses/archive/fall08/cos436/Duda/FS/stepwise.htm on July 8, 2018.
Shalizi, Cosma. Lecture 26: Variable Selection. Modern Regression for Undergraduates Class Notes.
http://www.stat.cmu.edu/~cshalizi/mreg/15/lectures/26/lecture-26.pdf
SAS Support. Forward Selection. The GLMSELECT Procedure. Retrieved from http://support.sas.com/documentation/cdl/en/statug/66859/HTML/default/viewer.htm#statug_glmselect_details03.htm on July 8, 2018.

------------------------------------------------------------------------------

Need help with a homework or test question? With Chegg Study, you can get step-by-step solutions to your questions from an expert in the field. Your first 30 minutes with a Chegg tutor is free!

Statistical concepts explained visually - Includes many concepts such as sample size, hypothesis tests, or logistic regression, explained by Stephanie Glen, founder of StatisticsHowTo.

Comments? Need to post a correction? Please post a comment on our Facebook page.

Check out our updated Privacy policy and Cookie Policy