Supervised Machine Learning can best be understood through the lens of bias-variance trade off.
In this article, I will try to explain the trade off to better understand Machine Learning Algorithms and get better performance.
In Supervised Machine Learning an algorithm learns a model from training data.
The process of Supervised ML is that we have a training set, which we feed to the learning algorithm. And the job of the Learning Algorithm is to output a function, also called Hypothesis function, is to make predictions on unseen data.
The prediction error for any ML algorithm can be broken in these parts:
- Bias Error
- Variance Error.
- Irreducible Error.
The Irreducible Error as the name suggests, cannot be reduced regardless of the algorithm used.
In this post, we will focus on Bias And Variance Error.
Bias are the simplifying assumption made by a model to make Target function easier to learn.
Generally, Linear Models have a high bias making them fast to learn and easier to understand but generally less flexible. In turn, they have lower predictive performance on complex problems that fail to meet the assumption.
Low-Bias: This suggests less assumptions about the form of the target function.
Example: Decision Trees, SVM, k-Nearest Neighbours.
High-Bias: Suggests more assumptions about the form of the target function.
Example: Linear Regression, Logistic Regression, LDA.
Variance is the amount that the estimate of the target function will change if different training data is used.
The hypothesis made by the ML model using the training data, so we can expect some variance. Ideally, it should not change too much from one training set to another. Meaning that the algorithm is good at picking out the hidden mapping between the input and output variables.
Low Variance: Suggests small changes to estimate of the target function with change in training dataset.
Example: Linear models such as Linear Regression, Logistic Regression, LDA.
High Variance: Suggests large changes to the estimate of the target function with changes in training dataset.
Example: Decision Tree, k-Nearest Neighbours, SVM.
The Goal of a good Machine Learning algorithm is to achieve Low Bias and Low Variance for good prediction performance.
- Linear models have High Bias and Low Variance as seen from above examples.
- Non Linear Models have Low Bias but High Variance.
There is no escaping the fact between Bias and Variance in Machine Learning.
Increasing the Bias will Decrease the Variance and Increasing the Variance will Decrease the Bias.
Configuring the Trade-Off for specific algorithm:
The k-Nearest Neighbour algorithm has low bias and high Variance although this can be changes by increasing the value of K, which increases the number of neighbours that contribute to the prediction and in turn increases the Bias of the model.
- Use more complex model (e.g. kernelize, use non-linear models)
- Add more features to your model.
- Perform Boosting
- Add more Training Data.
- Reduce Model Complexity
- Perform Bagging.
In this post, you discovered bias, variance and the bias-variance trade-off for machine learning algorithms.
You now know that:
- Bias is the simplifying assumptions made by the model to make the target function easier to approximate.
- Variance is the amount that the estimate of the target function will change given different training data.
- Trade-off is tension between the error introduced by the bias and the variance.
- How to deal with High Variance and High Bias.