In this article, I will explain what a confusion matrix is and how you can use it to check the performance of your machine learning classification algorithms. At the end of this article, I will show some references if you want to learn more about the confusion matrix.
What is the confusion matrix?
A Confusion matrix is a valuable table for checking the performance of machine learning classification algorithms by comparing predicted results with the actual results from a test dataset. This table has a dimension NxN, where N is the number of classified categories.
The image below shows an example of a confusion matrix for a binary classification (2×2 Dimension), the simplest confusion matrix you can find, with 1000 test samples. Despite being the simplest confusion matrix, the concepts can be generalized to NxN matrices.
In a confusion matrix, as shown above, there are four terminologies used to describe the prediction results, they are:
– True Positive (TP): name is given when both predicted and actual classes are positive.
– True Negative (TN): name is given when both predicted and actual classes are negative.
– False Negative (FN): name is given when the predicted class is negative, but the actual class is positive.
– False Positive (FP): name is given when the predicted class is positive, but the actual class is negative.
Based on these terminologies, we can get interesting measures into the performance of our classification models, such as accuracy, precision, recall and F1-Score.
- The accuracy measures the percentage of correct predictions of the model. It is calculated as a sum of True Negative and True Positive divided by the total samples.

- The precision measures the percentage of samples predicted as positive that are actually positive. It is calculated as the division of True Positive by the sum of True Positive and False Positive.

- The recall measures the percentage of actual positive samples predicted as positive. It is calculated as the division of True Positive by the sum of False Negative and True Positive.

The F1-score (or F-Score) measures the harmonic mean of the precision and recall.

What measure is the most appropriate for evaluating my model?
- The precision is preferable when the model’s objective is to maximize the True Positive value and minimize the False Positive value. For example, when the model’s goal is to classify spam messages as positive while aiming to minimize the misclassification of non-spam messages.
- The recall is preferable when the model’s objective is to identify as many actual positive samples as possible because it is a critical situation, for example, the identification of a cancer.
- Accuracy is preferable when the model’s objective is to classify as many samples as possible correctly. This measure is interesting when the dataset is not unbalanced (You can learn more about unbalanced datasets in “Analytics Vidhya — 5 techniques to handle imbalanced data for a classification problem”).
- The F1-score is preferable when both precision and recall are important for the model’s accuracy.
To learn more about the confusion matrix I suggest checking the following links: