In Machine Learning, one of the important technique must have to learn is Standardization which lies under Feature Scaling. First see the definition of this heavy word Feature Scaling:
Feature Scaling is a technique to standardize the independent features present in the data in a fixed range.
So, main keynote of this definition is that transform your numerical features to a small range like 0 to 1 or -1 to +1 etc. But question is why?
Lets see a Social Network Ads Data-Set (here, we are only interested on 2 features — “Age” & “EstimatedSalary”). This dataset looks like this:
Here, the range of EstimatedSalary column is much higher than Age column. Now if you use KNN algorithm where distance is calculated between the rows. As an example, I am calculating the distance of first 2 rows:
Here, 256 is coming from the “Age” column and 10*10⁶ is coming from the “EstimatedSalary” column which is much higher than 256. That’s why, in KNN like algorithms where distance is calculated, the big range features (here “EstimatedSalary”) will be dominated and the algorithm will not work well. And Feature Scaling helps to achieve this by fixing the ranges of the columns.
In general, Feature Scaling is 2 types:
- Standardization/Z-Score Normalization (focus of this article)
- Normalization (Will discus on this topic on a different article)
First see the formula of the Standardization:
By using the above equation, you can convert the numerical features to a small range. The mean of “Age” and “EstimatedSalary” columns are 37.655 and 69742.5 respectively. The standard deviation of these columns are 10.482877 and 34096.96 respectively. Now, for the first row, the scaled age and scaled estimated salary values will be
And this is how Standardization is performed. The good thing is that, we don’t need to implement the formula in your own. Scikit-Learn can handle this by using the below code.
from sklearn.preprocessing import StandardScaler# create a scaler object
scaler = StandardScaler()# Transform the independent features (X)
X_transformed = scaler.fit_transform(X)
After the Standardization, the result looks like this:
We have transformed the 2 features. Now Let’s see what has changed:
It looks like the positions of the data points are not changed but the scale of the axes are changed. And another one major change can see that the previous mean is changed to 0 after the transformation.
Another major change can see that, the std of both features are changed to 1.0. In standardization, the mean becomes 0, that’s why it is also called Mean Centering.
One more visualization can help to understand the transformation. Also after the transformation, you can understand their relations as well that is not possible (roughly) before the scaling.
From the above image, you can think that the distribution is changed for the features. But actually not. Below image is the prove. Only the scale is changed.
If you perform ML algorithms like LogisticRegression on both scaled and non-scaled dataset, you can see that the model of scaled dataset is performing well. There is no loss if you perform scaling for every algorithms but some algorithms can give you much better result. Like, DecisionTree algorithm will not be effected by scaling.
If there is outliers in your features, then the behavior of the outliers will not change after the transformation. In a simple words, the impacts of the outliers will not change in the scaled dataset.
Below image can help to understand when to use the Standardization.
Hope, you can understand this Standardization. You can find the codes for this article here.
Thank you very much.