Dimensionality Reduction Techniques
Second step when you have too much data
Low Variance Filter
When the feature in data set has near or same value. what’s the point in clogging the data.
High Correlation filter
Calculate the correlation between independent numerical variables that are numerical in nature. If the correlation coefficient crosses a certain threshold value, we can drop one of the variables
Random Forest
Random Forest is one of the most widely used algorithms for feature selection. Pick the most used features generated by random forest.
Backward Feature Elimination
Train the model with n features and with n-1 features and remove the feature if no drawback is seen. repeat as many time as needed.
Forward Feature Selection
Train the model with just 1 feature and with 2 features and include the new feature if results improve. repeat n time
Both Backward Feature Elimination and Forward Feature Selection are time consuming and computationally expensive
Principal Component Analysis (PCA)
PCA is a dimensionality-reduction method that is often used to reduce the dimensionality of large data sets, by transforming a large set of variables into a smaller one that still contains most of the information in the large set
Linear Discriminant Analysis or Normal Discriminant Analysis or Discriminant Function Analysis (LDA)
Its like PCA but maximizes separability among known category. Maximizes mean separation along with minimize scattering