Before getting into the regularization concept. Carefully looking at figure 1, we know that underfitting or overfitting is bad for our model. Therefore, to maintain an ideal balance one way is to reduce the dimension.
What is Regularization?
- It is an approach to addressing over-fitting in machine learning
- Over-fitting models fail to generalize estimates on test data
- Regularization reduces the variance of the model
We reduce the dimension by eliminating features that are not important. Each feature has a weight. We give the weight of these features a small number or consider it as zero. There are two techniques in which this can be achieved i.e L1 and L2.
Lasso Regression (L1)
It will prevent the weights from getting too large. Large weights mean more complexity and lead to overfitting. It is not necessary that more weight mean more importance to the feature. L1 introduced sparsity into the weights. it will force the weight of the less important features to be zero reducing the average magnitude of all the weights. Similarly, in L2 it forces the weights to be to a much smaller number.
where Lambda is a hyperparameter.
How can we make the weights Zero?
Here loss or cost function comes into the picture. The loss function should always be the minimum for which it has to be optimized.
- If the lambda value is too high — underfit
- If the lambda value is low — overfit
Ridge Regression (L2)
This L2 also prevents the weights from getting too large. It does so by converting the weights into a smaller number. Not all the features are changed.