Today, In this blog we will talk about Bias and Variance, what is Bias-Variance tradeoff and how it impacts our ML models.
Before we talk about bias and variance, we must understand errors.
Errors ( Reducible and irreducible errors): Machine learning algorithms use mathematical or statistical models with errors in two categories: reducible and irreducible errors. Irreducible error is due to natural variability within a system. In comparison, the reducible error is more controllable and should be minimized to ensure higher accuracy. Bias and variance are components of reducible error. The presence of these two components influences the model in various ways like overfitting, underfitting, etc.
Bias is nothing but the differences between actual and predicted values that a model predicts after learning patterns from the data, when the bias is high we say that the assumptions made by the model are oversimplified and hence it fails to capture the patterns in the training data.
This instance, where the model cannot find patterns in the training data and hence fails to predict for both seen and unseen data, is called Underfitting.
Variance, on the other hand, is quite opposite of Bias. We define variance as the model’s sensitivity to fluctuations in the data. When the variance is high, the model considers even the trivial or noisy features to be important and it predicts almost perfectly for training data, and therefore it becomes very specific to training data failing to generalize on unseen data.
This instance, where our model performs really well on training data and gets high accuracy but fails to perform on new, unseen data is called Overfitting.
For any model, we have to find the perfect balance between Bias and Variance. This ensures that we capture the essential patterns in our model while ignoring the noise present in it. It helps optimize the error in our model and keeps it as low as possible.
Thus, finding the right balance between the bias and variance of the model is called the Bias-Variance trade-off. It is basically a way to make sure the model is neither overfitted or underfitted in any case.
Now since we know what is bias-variance tradeoff. To achieve this balance, we can do the following:
- Add more input features to training data
- More complexity by introducing polynomial features or using complex algorithms.
- Decrease regularization term
- Getting more training data
Thanks for reading, Cheers 🙂