Laymen call it Average, Statisticians call it Arithmetic Mean. The mean is one of the most used measures of central tendency. A single value shows us the center of the data and what the data looks like. But it has its caveats;
The situation arises where a particular or group of observations in a data is too high or low. Such a number will have a drastic effect on the mean. Let’s look at the example of the score of 10 students in a Mathematics pop quiz graded on a scale of 100.
The mean of the observations will be approximately 50, which is invalid. If you look at the data. Most of the student’s scores lie within 30–45, but due to two students with high scores of 95 and 98. The mean is signifying that the center of the data lies at 50 which is False!
Removing the two numbers and getting the average,
you have approximately 38, which describes what the center of the data looks like.
Unlike the mode and the median, we can’t calculate the mean if a value is missing. Using the same example above,
Here the 8th data point is missing but whatever the value is, we are sure that the mode is 35 and the median is 40.5.
Unlike the mode and median which we can find by observing the data, we need to carry out calculations to get the mean. Though, sometimes, we can consider this an advantage of the mean over the median and the mode. Because we are sure that the results are not biased by the researcher.
Unlike the median which we can determine by drawing the cumulative frequency curve,
or the mode which can we can visualize by locating the highest point on a histogram. This is not possible with the mean.
Two students can have the same average while having different implications;
Their mean might be the same, but Student A is becoming worse while Student B is improving.
Despite its cons, the mean still has some of its advantages:
- It’s easy to use and understand
- It is least affected by sampling. A mean of a sample from a population, we can use to determine the whole mean of the population. Hence making it reliable.
- Its significant mathematical properties, makes it popular in inferential statistics.
Knowing the mean weakness, will let a Data Scientist know when not to use it.
In summary, we know that the mean will be a bad measure when we have extreme and missing values in a dataset. We can’t use it to make an implication or see it from a visualization.