前向回归
初始设置最优特征集为空。
进行迭代,依次从剩余的特征集中抽取特征,并与最优特征集合作,计算回归值,以及R2得分;找出最高得分,如果最高得分小于
最优特征集的得分,则迭代结束;否则,最高得分对应的特征集所选特征加入最优特征集,继续迭代。
import statsmodels.formula.api as smf
import pandas as pd
def forward_selected(data, response):
使用Adjusted R-squared来评判新加的参数是否提高回归中的统计显著性
Linear model designed by forward selection.
Parameters:
-----------
data : pandas DataFrame with all possible predictors and response
response: string, name of response column in data
Returns:
--------
model: an "optimal" fitted statsmodels linear model
with an intercept
selected by forward selection
evaluated by adjusted R-squared
"""
remaining = set(data.columns)
remaining.remove(response)
selected = []
current_score, best_new_score = 0.0, 0.0
while remaining and current_score == best_new_score:
scores_with_candidates = []
for candidate in remaining:
formula = "{} ~ {} + 1".format(response,
' + '.join(selected + [candidate]))
score = smf.ols(formula, data).fit().rsquared_adj
scores_with_candidates.append((score, candidate))
scores_with_candidates.sort()
best_new_score, best_candidate = scores_with_candidates.pop()
if current_score < best_new_score:
remaining.remove(best_candidate)
selected.append(best_candidate)
current_score = best_new_score
formula = "{} ~ {} + 1".format(response,
' + '.join(selected))
model = smf.ols(formula, data).fit()
return model

浙公网安备 33010602011771号