Z Score Normalization(Standard score formula)

by keshav

Normalization or standardization is the process of re-scaling original data without changing its original nature. It is the technique often applied as part of data pre-processing in Machine Learning. The main aim of normalization is to change the value of data in the dataset to a common scale, without distorting the differences in the ranges of value.We often define new boundary (most common is (0,1),(-1,1)) and convert data accordingly. This technique is useful in classification algorithms involving neural networks or distance-based algorithm (e.g. KNN, K-means).

In Z score normalization, the values are normalized based on the mean and standard deviation of attribute A. For V_i value of attribute A, normalized value U_i is given as,

where Avg(A) and Std(A) represents the average and standard deviation respectively for the values of attribute A.

Let’s see an example: Consider that the mean and standard deviation of values for attribute income $54,000 and $16,000 respectively. With z-score normalization, a value of $73,000 for income is normalized to (73,000-54,000)/16,000=1.225.

In Python:

from sklearn.preprocessing import StandardScaler

X=[[101,105,222,333,225,334,556],[105,105,258,354,221,334,556]]
print("Before standardisation X values are ", X)
sc_X = StandardScaler()
X = sc_X.fit_transform(X)
print("After standardisation X values are ", X)

Output:

Before standardization X values are

[[101, 105, 222, 333, 225, 334, 556],

[105, 105, 258, 354, 221, 334, 556]]

After standardization X values are

[-1. 0. -1. -1. 1. 0. 0.]

[ 1. 0. 1. 1. -1. 0. 0.]]

To read more on normalization visit here.

No Comments