# Data Normalization Techniques

First step to make sense of data

**Decimal scaling**

Divide the data to make the max value below 1

**Min-Max Normalization (**scaling to a range)

Minimum get transformed to 0 and maximum gets transformed to 1.

v’ = (v-min/max-min)

Guarantees all features will have the exact same scale but does not handle outliers well

**Z-Score Normalization (Robust Standardization)**

Z-score is a variation of scaling that represents the number of standard deviations away from the mean.

v’ = ((v-mean)/standard deviation)

Handles outliers, but does not produce normalized data with the

exactsame scale

**Log Transform**

The Log Transform decreases the effect of the outliers, due to the normalization of magnitude differences and the model become more robust.

v’ = log(v)

Log scaling is helpful when a handful of your values have many points, while most other values have few points. This data distribution is known as the power law distribution

**Feature Clipping**

If your data set contains extreme outliers, you might try feature clipping, which caps all feature values above (or below) a certain value to fixed value.

**Buckets with equally spaced boundaries**

the boundaries are fixed and encompass the same range

**Buckets with Quantile Bucketing**

Creating buckets that each have the same number of points

**Spotting Outliers with Inter-Quartile Range**

IQR = Q3-Q1

Lower Outlier Limit = Q1- 1.5*IQR

Upper Limit = Q3+1.5*!QR

example —

10.2, 14.1, 14.4. 14.4, 14.4, 14.5, 14.5, 14.6, 14.7, 14.7, 14.7, 14.9, 15.1, 15.9, 16.4

Q1(25th percentile) = 14.4

Q2(50th percentile) = 14.6

Q3(75th percentile) = 14.9

IQR = .5, Lower Limit = 13.65, Upper Limit =15.65