[Machine Learning]filter methods - Zhu Qing

公告

It is common that datasets have thousands of features. However, processing thousands of features during training and testing can be computationally infeasible. Besides, many irrelevant features can lead to overfitting. So, we need to select most relevant features in order to obtain faster, better and easier to understand learning models. There are a lot of methods for feature selection, such as wrapper method, filter method, univariate method and multivariate method. Here I want to talk about filter method.

Filter method means rank all the features using a measure of correlation with the label. And then select top K features to use in the model. There are a couple of ways to measure correlation between feature X and label Y: Mutual Information, Chi-square statistic, Pearson Correlation coefficient, Single-to-Noise Ratio and T-test.

1) Mutual Information:

As a feature of probability we know that, if X and Y is independent then P(X,Y)=P(X)P(Y);

Measure of dependence: