Dimensionality Reduction Techniques

rememberme
1 min readJan 16, 2022

--

Second step when you have too much data

Low Variance Filter

When the feature in data set has near or same value. what’s the point in clogging the data.

High Correlation filter

Calculate the correlation between independent numerical variables that are numerical in nature. If the correlation coefficient crosses a certain threshold value, we can drop one of the variables

Random Forest

Random Forest is one of the most widely used algorithms for feature selection. Pick the most used features generated by random forest.

Backward Feature Elimination

Train the model with n features and with n-1 features and remove the feature if no drawback is seen. repeat as many time as needed.

Forward Feature Selection

Train the model with just 1 feature and with 2 features and include the new feature if results improve. repeat n time

Both Backward Feature Elimination and Forward Feature Selection are time consuming and computationally expensive

Principal Component Analysis (PCA)

PCA is a dimensionality-reduction method that is often used to reduce the dimensionality of large data sets, by transforming a large set of variables into a smaller one that still contains most of the information in the large set

Linear Discriminant Analysis or Normal Discriminant Analysis or Discriminant Function Analysis (LDA)

Its like PCA but maximizes separability among known category. Maximizes mean separation along with minimize scattering

--

--

No responses yet