‘To err is human…’, a famous quote by Alexander Pope, means it is normal for people to make mistakes. So, it goes without saying that we expect computers to be error-free. But when you step into the world of machine learning, the first of the many concepts you must master is prediction errors — Bias and Variance.
Where does Prediction Error come from?
A supervised machine learning task involves training an algorithm such that it learns to map inputs to an output variable based on some training examples also known as a labeled data set. The model trains itself on the input variables (X) and tries to predict some target variable (y).
But these predictions aren’t 100% accurate. This difference between the actual and predicted values is the prediction error and becomes a basis of model evaluation and acts as an indicator to improve the model.
What is model accuracy?
When building a predictive model, one of the most important metrics you use to evaluate it is ‘accuracy’. It is calculated by dividing the number of correct predictions by the total number of predictions.
The accuracy of a machine learning algorithm is subject to various kinds of errors. To make your models as accurate as possible it is important to understand these errors and learn to find an optimal solution known as the Bias-Variance tradeoff.
The difference between the values predicted by a supervised machine learning model and the actual values of the target variable present in the data set is called Bias.
Having high bias in your model means it gives huge errors on the training as well as the testing data. It means your algorithm performs poorly on both the data it has and hasn’t seen. A highly biased model is one that has failed to recognize the patterns in the data and is an overly simple one. This is also known as the problem of Underfitting.
- Both training and testing error is high
Causes of Bias
- A very simple model such as a linear model
- Getting additional features
- Adding polynomial features such as x1 2 , x22 , x1x2 , etc.
- Decreasing Λ (lambda) also known as the Regularization Constant
Sometimes, a machine learning model performs extremely well on the training data. But when it is introduced to the testing data, the accuracy of the model falls down considerably. This is known as Variance.
Having high variance in your model means it didn’t just recognize the patterns in the data but learned the data points too well. This happens because your algorithm is highly complex and performs much better on the data it has seen than on data it hasn’t seen. It becomes too familiar with the data it was trained on, essentially learning it, but fails to predict accurately when new data is presented. This is the case of Overfitting.
- Testing error is much higher than the training error
Causes of Variance
- A very complex model such as a fourth-degree polynomial equation
- Getting more training examples
- Using smaller sets of features (removing x3, x4, x5, …)
- Increasing Λ (lambda) also known as the Regularization Constant
So far we have learned that bias refers to model error whereas variance refers to inconsistency in the accuracy of a model when applied to new data sets. The best model is the one that has low bias (low error and high accuracy) and low variance (consistent accuracy on new data sets).
Achieving this ideal model involves a tradeoff between the two errors known as the Bias-Variance Tradeoff. This is because you can’t have an algorithm that is too simple and too complex at the same time. The tradeoff in the complexity of the model helps achieve an optimal model for our predictive needs.
- A highly simple model underfits the data and fails to learn from it. Such a model has high bias.
- A highly complex model overfits the data and learns too much from it. Such a model has high variance.
- Bias-Variance Tradeoff helps us pick ‘just the right model’ where both bias and variance errors are balanced.
Remember that model building is an iterative process. It is highly unlikely to get the perfect model in the first go. By playing around with the data you will eventually be able to build the best fit model. But don’t forget — the lingering irreducible error will always be there!