[ad_1]

Laymen call it **Average**, Statisticians call it **Arithmetic Mean**. The mean is one of the most used measures of central tendency. A single value shows us the center of the data and what the data looks like. But it has its caveats;

The situation arises where a particular or group of observations in a data is too high or low. Such a number will have a drastic effect on the mean. Let’s look at the example of the score of **10 **students in a Mathematics pop quiz graded on a scale of **100**.

The mean of the observations will be approximately **50**, which is invalid. If you look at the data. Most of the student’s scores lie within **30–45**, but due to two students with high scores of** 95** and **98**. The mean is signifying that the center of the data lies at **50** which is False!

Removing the two numbers and getting the average,

you have approximately **38**, which describes what the center of the data looks like.

Unlike the mode and the median, we can’t calculate the mean if a value is missing. Using the same example above,

Here the **8th** data point is missing but whatever the value is, we are sure that the mode is **35** and the median is **40.5.**

Unlike the mode and median which we can find by observing the data, we need to carry out calculations to get the mean. Though, sometimes, we can consider this an advantage of the mean over the median and the mode. Because we are sure that the results are not biased by the researcher.

Unlike the median which we can determine by drawing the cumulative frequency curve,

or the mode which can we can visualize by locating the highest point on a histogram. This is not possible with the mean.

Two students can have the same average while having different implications;

Their mean might be the same, but **Student A** is becoming worse while **Student B** is improving.

Despite its cons, the mean still has some of its advantages:

- It’s easy to use and understand
- It is least affected by sampling. A mean of a sample from a population, we can use to determine the whole mean of the population. Hence making it reliable.
- Its significant mathematical properties, makes it popular in inferential statistics.

Knowing the mean weakness, will let a Data Scientist know when not to use it.

In summary, we know that the mean will be a bad measure when we have extreme and missing values in a dataset. We can’t use it to make an implication or see it from a visualization.

[ad_2]

Source link