# Tuning Parameter / Penalty Parameter

A tuning parameter (λ), sometimes called a penalty parameter, controls the strength of the penalty term in ridge regression and lasso regression. It is basically the amount of shrinkage, where data values are shrunk towards a central point, like the mean. Shrinkage results in simple, sparse models which are easier to analyze than high-dimensional data models with large numbers of parameters.

• When λ = 0, no parameters are eliminated. The estimate is equal to the one found with linear regression.
• As λ increases, more and more coefficients are set to zero and eliminated.
• When λ = ∞, all coefficients are eliminated.

There is a trade-off between bias and variance in resulting estimators. As λ increases, bias increases and as λ decreases, variance increases. For example, setting your tuning parameter to a low value results in a more manageable number of model parameters and lower bias, but at the expense of a much larger variance.

## L1 and L2 Penalties

Tuning parameters are part of a process called regularization, which works by biasing data towards particular values. Popular regularization methods use either an L1 or L2 penalty (or sometimes, a combination of both):

• L1 penalties limits the size of the coefficients and can result in sparse models (i.e. models with a small number of coefficients); Some coefficients are eliminated.
• L2 penalties do not result in sparse models because all coefficients are shrunk by the same factor and none are eliminated.

## How to Choose a Tuning Parameter

A set of two points can be fitted by multiple models, including a linear model (green) and an unlimited number of higher-degree polynomial models (red).

Choosing a tuning parameter is a challenging task. Optimal tuning parameters are “difficult to calibrate in practice” (Lederer and Müller, 2015) and are “not practically feasible” (Fan & Tang (2013). They depend on a quagmire of hard-to-quantify parameters like nuisance parameters in the population model. Specific techniques have their proponents and opponents, making the task even more difficult. For example, Tibshirani calls cross validation (a somewhat popular method for finding tuning parameters) “…a simple, intuitive way to estimate prediction error”, while Chand (n.d.) states the method “almost always fail[s] to achieve consistent variable selection”.

Although there isn’t an “optimal” tuning parameter for any particular scenario, finding one is necessary for any analysis involving high-dimensional data. Fan & Tang recommend:

Although the concept sounds simple (choose a method, then choose a model), it doesn’t work too well in some cases. For example, the number of models becomes unwieldy when the dimensionality p grows exponentially along with the sample size. When this happens, Fang and Tang (2013) note that “To the best of our knowledge, there is no existing work accommodating tuning parameter selection for general penalized likelihood methods.”

References:
Chand, S. (n.d.). On Tuning Parameter Selection of Lasso-Type Methods – A Monte Carlo Study. Proceedings of 2012 9th International Bhurban Conference on Applied Sciences & Technology (IBCAST) 120 Islamabad, Pakistan, 9th – 12th January, 2012Retrieved 8/14/2017 from: http://www.cmap.polytechnique.fr/~lepennec/enseignement/M2Orsay/06177542.pdf.
Fang, Y and Tang C (2013). Tuning parameter selection in high dimensional penalized likelihood. J. R. Statist. Soc. B, 75, Part 3, pp. 531–552. Retrieved 8/14/2017 from: http://www-bcf.usc.edu/~fanyingy/publications/JRSSB-FT13.pdf
Lederer and Müller (2015). Don’t Fall for Tuning Parameters: Tuning-Free Variable Selection in High Dimensions With the TREX. Retrieved August 14, 2017 from: https://arxiv.org/abs/1404.0541
Tibshirani, R. (2013). Model selection and validation 1: Cross-validation. PPT. Retrieved 8/14/2017 from: http://www.stat.cmu.edu/~ryantibs/datamining/lectures/18-val1.pdf

------------------------------------------------------------------------------

Need help with a homework or test question? With Chegg Study, you can get step-by-step solutions to your questions from an expert in the field. Your first 30 minutes with a Chegg tutor is free!

Statistical concepts explained visually - Includes many concepts such as sample size, hypothesis tests, or logistic regression, explained by Stephanie Glen, founder of StatisticsHowTo.